Live at https://sofar-finance.vercel.app
An AI-powered quantitative trading system combining LightGBM machine learning, 80 market signals, options analytics, and real-time market intelligence. Built on a Renaissance Technologies-inspired architecture of systematic signal discovery, non-linear feature engineering, and walk-forward validated predictions.
| Metric | Value |
|---|---|
| Walk-Forward Accuracy | 90.8% across 30 years (62 test windows) |
| Model | LightGBM v6 with 75 features |
| Signals | 80 total, 1.25M+ computed values |
| Options Data | 34.7M rows, 10 symbols, 2020-present |
| Greeks/IV Data | 1.85M rows with real implied volatility |
| Regime Accuracy | Pinned: 91.0% · Tension: 92.6% · Explosive: 92.2% |
| Confidence Calibration | 90-100% confidence = 98.9% accurate |
Data Ingestion (13 sources) → Signal Computation (80 signals)
→ Feature Engineering (75 features) → LightGBM Prediction
→ Trade Construction (specific options trades with Kelly sizing)
→ 7 daily AI Synthesis checkpoints → Frontend Dashboard
| Time | Action |
|---|---|
| Overnight | 4x global market scans (ES futures, Asia, Europe, VIX) |
| 7:00 AM | Morning brief — overnight thesis adjustment |
| 9:10 AM | Pre-market AI synthesis |
| 10:00 AM, 12:00 PM, 2:00 PM, 3:30 PM | Intraday AI synthesis |
| 4:40 PM | Full signal computation → LightGBM prediction → Trade construction |
| 6:00 PM | Evening AI synthesis with next-day analysis |
| Layer | Technology |
|---|---|
| ML Model | LightGBM (gradient boosted trees) |
| Signals | 80 signals: technical, options-derived, macro, alternative data |
| Options Data | ThetaData Terminal v3 (34.7M rows, real Greeks/IV) |
| AI Synthesis | Anthropic Claude (7 daily checkpoints) |
| Market Data | FMP, Yahoo Finance, FINRA, Polymarket |
| Frontend | Vanilla HTML/CSS/JS, Bloomberg-dark theme |
| Hosting | Vercel (static + serverless) |
| Database | Neon Postgres (cloud) |
| Infrastructure | Ubuntu WSL2, 40 cron jobs |
| Repo | GitHub: sofar-ai/sofar-finance |
RSI (5, 14, 21-day), MA position, Bollinger Band position, MACD, ATR regime, EMA trend, Stochastic K, Williams %R, CCI-20, ADX-14, ROC (5, 10, 20-day), OBV trend, volatility ratio, distance from MA200, range ratio
GEX regime, vol regime (explosive/pinned/tension/transitioning), options flow (put/call ratio)
VIX level, yield curve, news sentiment (Claude-scored), Polymarket macro composite, overnight global signal
FINRA dark pool short volume ratios, overnight market scanner (15 global markets)
| Source | Data | Rows | Update |
|---|---|---|---|
| ThetaData v3 | Options EOD (OHLCV, OI, Greeks) | 34.7M | Daily + streaming |
| ThetaData v3 | Greeks/IV backfill | 1.85M | Completed 2020-present |
| FMP Stable API | Daily prices (19 tickers) | 112K+ | Daily 4:30 PM |
| FMP Stable API | Treasury rates | 7,800+ | Daily |
| FMP Stable API | Earnings calendar | 4,000+ | Daily |
| FMP Stable API | News headlines | On demand | 4x daily |
| FINRA ADF | Dark pool short volume | 752 days | Daily 5 PM |
| Polymarket | Prediction market probabilities | Daily | Daily 5:15 PM |
| Yahoo Finance | ES/NQ futures, global indices | 1,815 days | 4x overnight |
| RSS/Scrapers | Headlines (MarketWatch, WSJ, etc.) | Continuous | Every 6 hours |
Walk-forward validation: 62 rolling windows across 30 years of data. Each window: 504 days training, 126 days testing, stepping 126 days forward. No look-ahead bias.
Performance by regime: | Regime | Accuracy | Notes | |——–|———-|——-| | Pinned | 91.0% | GEX-supported, mean reversion works | | Tension | 92.6% | Best regime — resolution imminent | | Explosive | 92.2% | High vol, model captures reversals | | Transitioning | 85.4% | Smallest sample, lower but still strong |
Performance by confidence: | Confidence | Accuracy | N predictions | |————|———-|—————| | 90-100% | 98.9% | 3,598 | | 80-90% | 90.4% | 1,595 | | 70-80% | 82.7% | 1,026 | | 60-70% | 67.4% | 867 |
Top features by importance:
The model learned that momentum trajectories (where was the indicator yesterday vs today) are more predictive than point-in-time snapshots. This is why lagged features dominate the top 5.
Converts LightGBM predictions into specific, risk-defined options trades:
LightGBM: BEARISH 65% → Regime: tension → Hold: 3 days
→ Bear Put Spread: Buy $660P / Sell $645P (Mar 23 exp)
→ Debit: $4.24 | Max Profit: $10.76 | Risk/Reward: 2.54x
→ Kelly sizing: Aggressive 12.5% | Conservative 1.2%
Strike selection: Uses GEX levels (put wall, call wall) as support/resistance for strike placement.
Regime-adaptive holding: | Regime | Optimal Hold | Sharpe | Strategy | |——–|————-|——–|———-| | Pinned | 1 day | 0.99 | Quick scalp, mean reversion | | Tension | 3 days | 1.71 | Swing hold, best risk-adjusted | | Explosive | 15 days | 0.52 | Patience, buy the crash |
| Page | Description |
|---|---|
| Market | Real-time quotes, charts, rates, headlines |
| Options Flow | ThetaData streaming tape, sorted by premium |
| AI Analysis | LightGBM prediction, regime badge, trade recs, quant section |
| Daily Summary | End-of-day analysis and recap |
| Ticker Deep Dives | Single-ticker AI analysis with input |
| Macro Events | Event tree tracking (Iran conflict, tariffs, etc.) |
| Performance | Prediction accuracy tracking and backchecking |
| Health | System diagnostics for all data feeds and services |
| Research | AI research lab and signal discovery |
| Table | Rows | |——-|——| | options_eod | 34.7M | | signal_values | 1.25M | | prices_daily | 112K | | gex_historical | 4,682 | | dark_pool_volume | 752 | | experiments | 28 |
Two paths for finding new alpha:
Autonomous signal discovery loop adapted from R&D-Agent-Quant and Chain-of-Alpha architectures. Proposes signals, implements, backtests via LightGBM walk-forward, promotes or rejects. 28 experiments run, knowledge base accumulating.
| Date | Model | Accuracy | Signals | |——|——-|———-|———| | Mar 20 (Fri) | Linear weighted v5 | 51.2% | 6 | | Mar 21 (Sat) | LightGBM v2 | 73.6% | 10 | | Mar 21 (Sat) | LightGBM v3 | 81.2% | 10 | | Mar 21 (Sat) | LightGBM v4 | 82.2% | 11 | | Mar 21 (Sat) | LightGBM v5 | 83.6% | 24 | | Mar 22 (Sun) | LightGBM v6 | 90.8% | 75 |
/etc/neon.env DATABASE_URL
/etc/anthropic.env ANTHROPIC_API_KEY
/etc/fmp.env FMP_API_KEY
~/sofar-finance/ Frontend repo (Vercel deployment)
~/scripts/ All backend scripts
~/scripts/signals/ Signal computation modules
~/scripts/models/ LightGBM model files
~/logs/ Cron job logs