The Geopolitical Oil Risk (GOR) Index is a composite risk score from 0.0 to 10.0 that quantifies the real-time threat level to global oil supply. It fuses intelligence from 18 data sources across six signal dimensions: NLP-scored events, maritime chokepoint monitoring, market stress indicators, satellite/seismic detection, trend momentum, and infrastructure damage.
The index is recomputed every 5 minutes and drives a five-tier regime classification with hysteresis to prevent rapid oscillation between states.
The GOR Index separates reality into two tracks that behave differently over time:
Flow Track — decays with events
State Track — persistent physical reality
Why two tracks?
Flow signals capture what is happening right now — news events, market reactions, maritime anomalies. These should decay. When the news cycle quiets and markets stabilise, the flow score falls. Absence of new events is itself information.
State signals capture what the world actually is — confirmed physical damage to facilities, persistent supply capacity lost. These should not decay because media moved on. If South Pars is offline and no journalist reports on it tomorrow, the physical damage is still there.
The state floor
The index cannot fall below 60% of the state score. At maximum infrastructure damage (score 10.0), the floor is 6.0 — C_ELEVATED — the minimum appropriate regime when that much supply capacity is offline. At minor damage (score 2.0), the floor is 1.2 — negligible. The floor scales with actual severity.
Why Infrastructure dominates the state track (85%)?
The facility damage registry — built from 350+ strategic facilities with confidence scores, production impact percentages, and recovery timelines — is cross-checked by the verifier agent every 3 hours. It is the closest thing this system has to ground truth about the world's physical state.
Why EIA stock data is not in the state track?
EIA measures US domestic crude inventories. During a Hormuz closure, US stocks may build (comfortable domestic supply) while the strait is shut. Using this as a state signal would actively lower the state floor — the wrong direction. EIA data remains in the flow track via the NLP subindex.
Aggregates LLM-scored events from the last 4 hours. Each event is rated on escalation (1–10), disruption probability (0–1), actor capability, and infrastructure specificity.
Tier multipliers: Tier 1 keywords = 3.0×, Tier 2 = 1.5×, unmatched = 1.0×. Stage weights: unconfirmed = 0.70×, corroborated = 1.00×, verified = 1.15×, expired = 0.30×. Disruption boost activates when probability > 0.65 with infrastructure specificity.
Monitors Strait of Hormuz transits and Bab al-Mandab diversions via AIS vessel tracking. Low transit z-scores indicate blockage risk.
Hormuz score inverts the z-score: negative z (fewer transits) = higher risk. Red Sea diversion percentage is estimated from Houthi/Red Sea event frequency over 7 days.
Maritime Intelligence — AIS Tracking
Vessel positions are sourced via aisstream.io WebSocket streaming, monitoring two chokepoint bounding boxes simultaneously: the Strait of Hormuz (latitude 25–27°N, longitude 56–58°E) and Bab al-Mandab (latitude 11.5–13°N, longitude 43–45°E). Each vessel is tagged to its transit zone and tracked independently in Redis.
Two AIS message types are processed: PositionReport (Type 1/2/3) — broadcast every few minutes per vessel, providing position, speed over ground (SOG), course, and navigation status. ShipStaticData (Type 5) — broadcast alongside position reports, providing vessel name, dimensions, and MaximumStaticDraught — the key cargo signal.
Vessel Filtering
Only vessels matching both criteria are counted:
This combination reliably identifies VLCCs and large Suezmax tankers.
Laden / Ballast Classification
Draught = how deep the hull sits in water. A VLCC fully loaded with ~2 million barrels of crude oil sits ~20–21 metres deep. The same vessel returning empty (ballast) sits ~10–12 metres deep.
Classification threshold: 16 metres
Barrel estimate: laden_count × 2,000,000 barrels (approximate VLCC capacity)
Limitations
Vessels with AIS transponders disabled (shadow fleet) are not counted. Draught data may be absent for some vessels — these are counted in the transit total but excluded from the laden/ballast breakdown. Laden/ballast classification applies to Hormuz transits only. 24-hour rolling count: each unique MMSI tracked via Redis with 24-hour TTL per zone.
Bab al-Mandab AIS Tracking
The Bab al-Mandab strait (latitude 11.5–13°N, longitude 43–45°E) is monitored via the same aisstream.io WebSocket connection as Hormuz. VLCC transits (ship type 80–89, length ≥ 280m) are counted over a 24-hour rolling window using Redis TTL per MMSI.
Z-scores are computed against a 7-year daily baseline from IMF PortWatch (portwatch.imf.org), which provides tanker transit counts dating back to 2019 for both Hormuz and Bab al-Mandab. Negative z (fewer tankers than the daily historical baseline) maps to higher risk using the same abs(z) × 2.5 scaling as Hormuz. When PortWatch data is unavailable, the score falls back to Red Sea diversion percentage estimated from Houthi/Red Sea event frequency over the prior 7 days. The Maritime Panel surfaces the z-score directly when available, with the diversion percentage shown as fallback.
Weight increased from 10% to 15% reflecting Bab al-Mandab's role as the primary Red Sea chokepoint for Europe-bound Gulf crude.
Combines oil volatility (OVX), Brent-WTI spread, and BDTI tanker rates. Uses 90-day rolling z-scores to detect anomalous spikes.
OVX has an absolute floor: any OVX > 30 receives a minimum score regardless of z-score. Backwardation signal (Brent vs 5-day average) provides additional supply-tightness context.
Detects thermal anomalies (NASA FIRMS) and earthquakes (USGS) near oil infrastructure. Uses proximity-based scoring against a database of known facility coordinates.
Measures rate of change in the GOR Index across multiple time horizons. Only contributes during rising conditions (crisis acceleration). Clamped to 0 when the index is declining.
Four time horizons: 4h (most recent, 40% weight), 12h (25%), 24h (20%), 7-day (15%). The 7-day horizon detects sustained escalation trends distinct from short-term spikes. Event velocity detects sudden surges: when the last 4 hours see 3× more events than the 24-hour average rate, momentum adds a crisis acceleration bonus.
The GOR Index maps to five named regimes. Upgrades require confirmation across 2 consecutive computation cycles (hysteresis) to prevent false escalation. Downgrades are immediate.
The forecast engine generates daily 8-week conditional projections across 3 scenarios: De-Escalation, Base Case, and Escalation. The active scenario is auto-selected daily based on the GOR Index level.
Two-layer architecture:
Chokepoint multiplier — Hormuz status
Duration factor — 8-week horizon
Scenario modifier
Calibrated against historical supply shocks: 1973 OPEC embargo (~5% supply removed), 1990 Gulf War (~4.3M bpd), 2022 Russia-Ukraine sanctions (~2-3M bpd). The $3/1% elasticity is conservative; higher historical spikes reflect uncertainty premium captured separately by the NLP and Market subindices. Fallback to regime-only trajectory when disruption data unavailable.
Forecast regenerates daily at 06:00 UTC. Estimated API cost: ~$2–4/day across all LLM tasks (briefs, discovery agents, forecast). Discovery agents use Claude Haiku to minimize cost; brief generation and damage verification use Claude Sonnet.
Drift charts at /forecast/chart/[metric] show how each metric's projection has evolved across all historical forecasts — useful for assessing model convergence.
The Fuel Security Index (FSI) tracks downstream vulnerability across 22 import-dependent countries — the nations most exposed to supply disruption when upstream chokepoints are threatened.
Each country is scored 0–10 every 30 minutes across four dimensions:
Countries are classified into response tiers: EMERGENCY (score ≥ 8) · CRITICAL (≥ 7) · RESTRICTED (supply controls active) · MANAGED (government intervention) · ADVISORY (monitoring only)
Supply flow origin data is sourced from UN Comtrade public API (HS codes 2709 crude, 2710 refined petroleum), refreshed monthly. The FSI banner on the main dashboard activates when any country reaches score ≥ 6.
The Physical Signal Layer provides an advisory view of supply-side reality, independent of narrative reporting. It is not included in the GOR composite score — it is a cross-check signal.
Scoring thresholds (week-over-week change in thousand barrels)
| Change | Escalation Score | Interpretation |
|---|---|---|
| < −10,000 kb | 8.0 | Major draw — significant supply stress |
| < −5,000 kb | 6.0 | Moderate draw |
| < 0 kb | 4.0 | Any draw |
| > +5,000 kb | 2.0 | Large build — supply comfort |
| else | 3.0 | Small build or flat |
Runs every 5 minutes. Compares the AIS Hormuz transit z-score against the NLP narrative event rate to detect divergence between physical reality and media reporting.
Thresholds: 1.5σ (consistent with U1 corroboration pipeline)
| Divergence Type | Condition | Interpretation |
|---|---|---|
| PHYSICAL_LEADS | AIS z ≤ −1.5 AND NLP ratio < 0.67 | Physical disruption not yet in media — early warning signal |
| NARRATIVE_LEADS | NLP ratio ≥ 1.5 AND AIS z > −0.5 | Media escalation without physical confirmation — possible noise |
| ALIGNED | Neither condition met | Signals consistent |
NLP event rate exclusions: satellite (USGS, FIRMS) and market_data (EIA) source classes are excluded from the NLP baseline. Only text-based news signals are counted.
Why advisory only: EIA data covers US crude stocks only (not global). The AIS divergence detector monitors Hormuz only (not all chokepoints). Integration into the GOR composite is planned for U4 when multi-chokepoint AIS coverage and global inventory proxies are available.
Weekly actuals are confirmed from yfinance (S&P 500, XLE, VIX, Brent, WTI) and EIA (SPR levels, gas prices) every Saturday at 08:00 UTC. Each past week's forecast is scored against confirmed values.
Metrics with actual values below 0.5 in magnitude are excluded to avoid denominator distortion.
Accuracy dashboard at /forecast/accuracy.
The system maintains a persistent damage registry for 350+ strategic oil, LNG, refinery, and pipeline facilities globally. Unlike news scoring (which decays), damage records persist until recovery is confirmed.
Each record tracks:
Multiple autonomous agents run continuously to maintain accuracy:
The /disruption page computes a real-time supply disruption composite across four commodity categories, updated every 5 minutes (Redis-cached for 300s):
AT_RISK classification uses two layers: (1) Iran named threat list (SAMREF, Jubail Industrial City, Al Hosn, Mesaieed, Ras Laffan) and (2) FIRMS proximity — any facility within 50km of a NASA FIRMS fire detection in the last 48h (min_frp-filtered; industrial flaring excluded). Facilities with active damage records are classified DISRUPTED/DEGRADED, not AT_RISK.
Every ingested event passes through a multi-stage pipeline before contributing to the index:
Every event entering the NLP subindex is assigned a confidence stage that determines its weight in index computation. This separates speed of detection (preserved) from credibility of sustained elevation (enforced).
| Stage | Trigger | NLP Weight |
|---|---|---|
| UNCONFIRMED | Single low/medium-tier source (Telegram, RSS, OSINT) | 0.70× |
| CORROBORATED | Cross-signal detected within 90-min window | 1.00× |
| VERIFIED | Verifier agent confirmed via authoritative source | 1.15× |
| EXPIRED | 90-min window closed, no corroboration received | 0.30× |
Corroboration signals — any one is sufficient
Why 0.70× not 0.00×: OSINT sources, particularly Telegram channels, frequently detect real events 30–90 minutes before mainstream confirmation. Zeroing their contribution would sacrifice early warning capability. The 0.70× weight ensures the index moves on credible OSINT signals while preventing single unconfirmed sources from triggering regime escalation alone.
What users see: Event stage badges are visible on the live feed — UNCONFIRMED, CORROBORATED, VERIFIED, EXPIRED. Sustained elevation indicates corroboration; spike-and-retreat indicates single-source noise.
| Source | Type | Schedule | Reliability | Scoring |
|---|---|---|---|---|
| GDELT | News | 15 min | 0.85 | LLM |
| RSS (11 feeds) | News | 10 min | 0.55–0.95 | LLM |
| NewsAPI | News | 15 min | 0.85 | LLM |
| Telegram (11 ch) | OSINT | 15 min | 0.55–0.80 | LLM |
| AIS (aisstream) | Maritime | 5 min | 0.99 | FORMULA |
| yfinance | Market | 5 min | 0.90 | FORMULA |
| NASA FIRMS | Satellite | 1 hour | 0.99 | FORMULA |
| NOAA NHC | Weather | 1 hour | 0.99 | FORMULA |
| USGS | Seismic | 1 hour | 0.99 | FORMULA |
| IAEA | Official | 2 hours | 0.95 | LLM |
| EIA | Official | 6 hours | 0.99 | FORMULA |
| EIA (expanded) | Official | 6 hours | 0.99 | FORMULA |
| OpenSanctions | Regulatory | 6 hours | 0.95 | FORMULA |
| ACLED | Conflict | 6 hours | 0.90 | FORMULA |
| UN ReliefWeb | Official | 6 hours | 0.90 | FORMULA |
| Groq Vision | Satellite/Media | 15 min | 0.75 | LLM |
| Verifier Agent | Intelligence | 3 hours | 0.90 | LLM |
| GOR Discovery Agent | Intelligence | Daily | 0.80 | LLM |
| FSI Discovery Agent | Intelligence | Daily | 0.80 | LLM |
FORMULA-scored sources bypass the LLM entirely, reducing API costs by ~60%. All sources are free-tier with no API keys required (except ACLED and NewsAPI).
The system uses Groq free tier (llama-3.3-70b-versatile (Groq)) with a daily limit of 14,400 requests. Three mechanisms minimize LLM usage:
Events are filtered and boosted using a two-tier keyword taxonomy:
GOR System — Geopolitical Oil Risk Quantification Engine
All data sources are public domain or free-tier. No classified or proprietary intelligence is used. — March 2026