Market Regime Detector
Cross-Asset Sentiment-Based Market Regime Classification
01 The Problem
Market regime — whether we're in a risk-on, risk-off, or transitional environment — is one of the most consequential inputs in systematic portfolio management. Traditional regime detection methods rely on lagging price-based indicators (moving averages, volatility measures) that describe what already happened, not what the market is currently feeling.
The hypothesis: the language in financial news, earnings calls, and regulatory filings contains forward-looking sentiment signals that lead price-based indicators. If we can extract and aggregate this sentiment across asset classes, we can classify market regimes in near real-time.
02 The Approach
We built a pipeline that ingests financial text from three sources: financial news (Reuters, Bloomberg headlines via API), earnings call transcripts, and SEC 10-K/10-Q filings via the EDGAR API. Text is preprocessed and run through FinBERT — a BERT model fine-tuned on financial text — to produce sentiment scores at the sentence level.
Sentence-level scores are aggregated into daily sentiment signals per asset class (equities, credit, rates, commodities). These signals, combined with traditional volatility and correlation features, feed a classification model that assigns the current market to one of four regimes: Risk-On, Risk-Off, Recovery, or Stress.
The system was deployed on AWS SageMaker with a scheduled inference pipeline running nightly. A Plotly-based dashboard surfaces the current regime classification, historical regime transitions, and the top contributing sentiment signals.
03 The Stack
- FinBERT (HuggingFace) — financial sentiment classification on raw text
- AWS SageMaker — model hosting, scheduled batch inference, experiment tracking
- SEC EDGAR API — programmatic access to 10-K/10-Q filings
- Pandas + NumPy — signal aggregation, feature engineering, regime labeling
- Plotly — interactive regime visualization dashboard
04 The Outcome
The system produces daily regime classifications with interpretable signal attribution — traders and portfolio managers can see not just the regime label, but which sectors and source documents drove the classification. This interpretability is critical in financial applications where "black box" outputs are operationally unacceptable.
Backtested regime signals showed consistent lead time over price-based indicators in major market transitions, validating the core hypothesis that text sentiment precedes price movement during regime shifts.