METHODOLOGY

What is the GOR Index?

The Geopolitical Oil Risk (GOR) Index is a composite risk score from 0.0 to 10.0 that quantifies the real-time threat level to global oil supply. It fuses intelligence from 18 data sources across six signal dimensions: NLP-scored events, maritime chokepoint monitoring, market stress indicators, satellite/seismic detection, trend momentum, and infrastructure damage.

The index is recomputed every 5 minutes and drives a five-tier regime classification with hysteresis to prevent rapid oscillation between states.

Composite Index Formula

The GOR Index separates reality into two tracks that behave differently over time:

FLOW = (NLP × 0.32) + (Maritime × 0.26) + (Market × 0.19) + (Satellite × 0.13) + (Momentum × 0.10)
STATE = (Infrastructure × 0.85) + (Divergence × 0.15)
GOR = max(STATE × 0.60, 0.75 × FLOW + 0.25 × STATE)

Flow Track — decays with events

NLP Events

32%

Maritime

26%

Market

19%

Satellite

13%

Momentum

10%

State Track — persistent physical reality

Infrastructure Damage

85%

Divergence Detector

15%

Why two tracks?

Flow signals capture what is happening right now — news events, market reactions, maritime anomalies. These should decay. When the news cycle quiets and markets stabilise, the flow score falls. Absence of new events is itself information.

State signals capture what the world actually is — confirmed physical damage to facilities, persistent supply capacity lost. These should not decay because media moved on. If South Pars is offline and no journalist reports on it tomorrow, the physical damage is still there.

The state floor

The index cannot fall below 60% of the state score. At maximum infrastructure damage (score 10.0), the floor is 6.0 — C_ELEVATED — the minimum appropriate regime when that much supply capacity is offline. At minor damage (score 2.0), the floor is 1.2 — negligible. The floor scales with actual severity.

Why Infrastructure dominates the state track (85%)?

The facility damage registry — built from 350+ strategic facilities with confidence scores, production impact percentages, and recovery timelines — is cross-checked by the verifier agent every 3 hours. It is the closest thing this system has to ground truth about the world's physical state.

Why EIA stock data is not in the state track?

EIA measures US domestic crude inventories. During a Hormuz closure, US stocks may build (comfortable domestic supply) while the strait is shut. Using this as a state signal would actively lower the state floor — the wrong direction. EIA data remains in the flow track via the NLP subindex.

Sub-Index Definitions

1. NLP Event Score

Aggregates LLM-scored events from the last 4 hours. Each event is rated on escalation (1–10), disruption probability (0–1), actor capability, and infrastructure specificity.

article_raw = escalation × source_reliability × actor_capability × novelty_factor × tier_multiplier × stage_weight
NLP = avg(article_raw_weighted) + disruption_boost

Tier multipliers: Tier 1 keywords = 3.0×, Tier 2 = 1.5×, unmatched = 1.0×. Stage weights: unconfirmed = 0.70×, corroborated = 1.00×, verified = 1.15×, expired = 0.30×. Disruption boost activates when probability > 0.65 with infrastructure specificity.

2. Maritime Chokepoint Score

Monitors Strait of Hormuz transits and Bab al-Mandab diversions via AIS vessel tracking. Low transit z-scores indicate blockage risk.

Maritime = (Hormuz_zscore × 0.45) + (Anchor_score × 0.20) + (Diversion_score × 0.20) + (Bab_score × 0.15)

Hormuz score inverts the z-score: negative z (fewer transits) = higher risk. Red Sea diversion percentage is estimated from Houthi/Red Sea event frequency over 7 days.

Maritime Intelligence — AIS Tracking

Vessel positions are sourced via aisstream.io WebSocket streaming, monitoring two chokepoint bounding boxes simultaneously: the Strait of Hormuz (latitude 25–27°N, longitude 56–58°E) and Bab al-Mandab (latitude 11.5–13°N, longitude 43–45°E). Each vessel is tagged to its transit zone and tracked independently in Redis.

Two AIS message types are processed: PositionReport (Type 1/2/3) — broadcast every few minutes per vessel, providing position, speed over ground (SOG), course, and navigation status. ShipStaticData (Type 5) — broadcast alongside position reports, providing vessel name, dimensions, and MaximumStaticDraught — the key cargo signal.

Vessel Filtering

Only vessels matching both criteria are counted:

— IMO ship type 80–89 (crude and product tankers)
— Vessel length ≥ 280 metres (derived from ShipStaticData Dimension fields)

This combination reliably identifies VLCCs and large Suezmax tankers.

Laden / Ballast Classification

Draught = how deep the hull sits in water. A VLCC fully loaded with ~2 million barrels of crude oil sits ~20–21 metres deep. The same vessel returning empty (ballast) sits ~10–12 metres deep.

Classification threshold: 16 metres

— Above 16m → LADEN (carrying cargo)
— Below 16m → BALLAST (returning empty)

Barrel estimate: laden_count × 2,000,000 barrels (approximate VLCC capacity)

Limitations

Vessels with AIS transponders disabled (shadow fleet) are not counted. Draught data may be absent for some vessels — these are counted in the transit total but excluded from the laden/ballast breakdown. Laden/ballast classification applies to Hormuz transits only. 24-hour rolling count: each unique MMSI tracked via Redis with 24-hour TTL per zone.

Bab al-Mandab AIS Tracking

The Bab al-Mandab strait (latitude 11.5–13°N, longitude 43–45°E) is monitored via the same aisstream.io WebSocket connection as Hormuz. VLCC transits (ship type 80–89, length ≥ 280m) are counted over a 24-hour rolling window using Redis TTL per MMSI.

Z-scores are computed against a 7-year daily baseline from IMF PortWatch (portwatch.imf.org), which provides tanker transit counts dating back to 2019 for both Hormuz and Bab al-Mandab. Negative z (fewer tankers than the daily historical baseline) maps to higher risk using the same abs(z) × 2.5 scaling as Hormuz. When PortWatch data is unavailable, the score falls back to Red Sea diversion percentage estimated from Houthi/Red Sea event frequency over the prior 7 days. The Maritime Panel surfaces the z-score directly when available, with the diversion percentage shown as fallback.

Weight increased from 10% to 15% reflecting Bab al-Mandab's role as the primary Red Sea chokepoint for Europe-bound Gulf crude.

3. Market Stress Score

Combines oil volatility (OVX), Brent-WTI spread, and BDTI tanker rates. Uses 90-day rolling z-scores to detect anomalous spikes.

Market = (OVX_zscore × 0.50) + (Spread_score × 0.30) + (BDTI_zscore × 0.20)

OVX has an absolute floor: any OVX > 30 receives a minimum score regardless of z-score. Backwardation signal (Brent vs 5-day average) provides additional supply-tightness context.

4. Satellite & Seismic Score

Detects thermal anomalies (NASA FIRMS) and earthquakes (USGS) near oil infrastructure. Uses proximity-based scoring against a database of known facility coordinates.

FIRMS: ≤10km = 8, 10–30km = 6, 30–50km = 4
USGS: M6+ ≤30km = 9, M5+ ≤50km = 7, M4+ ≤30km = 5
Cluster bonus: >5 fire detections in 24h adds +2

5. Trend Momentum

Measures rate of change in the GOR Index across multiple time horizons. Only contributes during rising conditions (crisis acceleration). Clamped to 0 when the index is declining.

trend = (Δ4h × 0.40 + Δ12h × 0.25 + Δ24h × 0.20 + Δ7d × 0.15) × 2.0
velocity_bonus = if events_4h > 3× average → up to +3.0
Momentum = max(trend + velocity_bonus, 0.0)

Four time horizons: 4h (most recent, 40% weight), 12h (25%), 24h (20%), 7-day (15%). The 7-day horizon detects sustained escalation trends distinct from short-term spikes. Event velocity detects sudden surges: when the last 4 hours see 3× more events than the 24-hour average rate, momentum adds a crisis acceleration bonus.

Regime Classification

The GOR Index maps to five named regimes. Upgrades require confirmation across 2 consecutive computation cycles (hysteresis) to prevent false escalation. Downgrades are immediate.

A NOISE

0.0 – 3.0

Background noise

B TENSION

3.1 – 5.5

Elevated rhetoric

C ELEVATED

5.6 – 7.0

Active threat

D CRISIS

7.1 – 8.5

Supply disruption

E SUPPLY SHOCK

8.6 – 10.0

Major event

8-Week Conditional Forecast Engine

The forecast engine generates daily 8-week conditional projections across 3 scenarios: De-Escalation, Base Case, and Escalation. The active scenario is auto-selected daily based on the GOR Index level.

Scenario auto-selection:
  GOR < 3.5 → De-Escalation active
  3.5–6.5 → Base Case active
  GOR > 6.5 → Escalation active

Two-layer architecture:

Energy Prices — Disruption-Anchored Formula

Oil price projections use the disruption index as the primary mechanical input, replacing the previous regime-only trajectory model.

weekly_Δbrent = BASE_ELASTICITY × disruption_pct × chokepoint_multiplier × duration_factor × scenario_modifier
BASE_ELASTICITY = $3.00 per 1% supply disruption per week
MAX_WEEKLY_DELTA = $15.00/week (demand destruction cap)

Chokepoint multiplier — Hormuz status

1.0×

NORMAL

1.4×

RESTRICTED

1.8×

BLOCKED

2.2×

CLOSED

Duration factor — 8-week horizon

0.6

0.8

1.0

1.1

1.15

1.1

1.0

Scenario modifier

1.15×

Escalation

1.00×

Base

0.70×

De-escalation

Calibrated against historical supply shocks: 1973 OPEC embargo (~5% supply removed), 1990 Gulf War (~4.3M bpd), 2022 Russia-Ukraine sanctions (~2-3M bpd). The $3/1% elasticity is conservative; higher historical spikes reflect uncertainty premium captured separately by the NLP and Market subindices. Fallback to regime-only trajectory when disruption data unavailable.

Macro Impact — Claude Sonnet (LLM)

12 dimensions projected: S&P 500, XLE, VIX, USD/EUR, US/EU GDP, global trade, shipping costs, CPI, consumer sentiment. Anchored to live market data fetched daily (SPY, XLE, VIX, OVX, Brent, WTI, USD/EUR) via yfinance and Alpha Vantage.

Forecast regenerates daily at 06:00 UTC. Estimated API cost: ~$2–4/day across all LLM tasks (briefs, discovery agents, forecast). Discovery agents use Claude Haiku to minimize cost; brief generation and damage verification use Claude Sonnet.

Drift charts at /forecast/chart/[metric] show how each metric's projection has evolved across all historical forecasts — useful for assessing model convergence.

Fuel Security Index (FSI)

The Fuel Security Index (FSI) tracks downstream vulnerability across 22 import-dependent countries — the nations most exposed to supply disruption when upstream chokepoints are threatened.

Each country is scored 0–10 every 30 minutes across four dimensions:

Strategic reserve days (days of current consumption covered by existing stocks)
Import dependency (% of fuel consumption sourced from imports)
Supply flow Hormuz exposure (% of imports transiting Hormuz-adjacent routes, from UN Comtrade bilateral trade data)
Active stress signals (rationing announcements, dry station reports, price subsidy changes, panic buying)

FSI Score = (reserve_stress × 0.40) + (import_dependency × 0.25) + (hormuz_exposure × 0.20) + (stress_signals × 0.15)

reserve_stress inverts reserve days: 0 days = score 10, 90+ days = score 0

Countries are classified into response tiers: EMERGENCY (score ≥ 8) · CRITICAL (≥ 7) · RESTRICTED (supply controls active) · MANAGED (government intervention) · ADVISORY (monitoring only)

FSI Discovery Agent — daily 10:00 UTC

Uses Claude Haiku (claude-haiku-4-5-20251001) with web_search to find current fuel reserve levels and shortage signals for all 22 tracked countries. Findings are written to the events table and update reserve estimates when credible data is found (confidence ≥ 0.5). Redis dedup prevents duplicate events (25h TTL). ~28 web searches per run.

Supply flow origin data is sourced from UN Comtrade public API (HS codes 2709 crude, 2710 refined petroleum), refreshed monthly. The FSI banner on the main dashboard activates when any country reaches score ≥ 6.

Physical Signal Layer

The Physical Signal Layer provides an advisory view of supply-side reality, independent of narrative reporting. It is not included in the GOR composite score — it is a cross-check signal.

EIA Weekly Crude Stocks

Series: WCRSTUS1 (US total crude oil stocks, National)
Source: US Energy Information Administration API v2
Schedule: Published weekly (Wednesdays); ingested every 6 hours
History: 52 weeks of history maintained for baseline comparisons

Scoring thresholds (week-over-week change in thousand barrels)

Change	Escalation Score	Interpretation
< −10,000 kb	8.0	Major draw — significant supply stress
< −5,000 kb	6.0	Moderate draw
< 0 kb	4.0	Any draw
> +5,000 kb	2.0	Large build — supply comfort
else	3.0	Small build or flat

AIS/NLP Divergence Detector

Runs every 5 minutes. Compares the AIS Hormuz transit z-score against the NLP narrative event rate to detect divergence between physical reality and media reporting.

Thresholds: 1.5σ (consistent with U1 corroboration pipeline)

Divergence Type	Condition	Interpretation
PHYSICAL_LEADS	AIS z ≤ −1.5 AND NLP ratio < 0.67	Physical disruption not yet in media — early warning signal
NARRATIVE_LEADS	NLP ratio ≥ 1.5 AND AIS z > −0.5	Media escalation without physical confirmation — possible noise
ALIGNED	Neither condition met	Signals consistent

NLP event rate exclusions: satellite (USGS, FIRMS) and market_data (EIA) source classes are excluded from the NLP baseline. Only text-based news signals are counted.

Why advisory only: EIA data covers US crude stocks only (not global). The AIS divergence detector monitors Hormuz only (not all chokepoints). Integration into the GOR composite is planned for U4 when multi-chokepoint AIS coverage and global inventory proxies are available.

Forecast Accuracy Tracking

Weekly actuals are confirmed from yfinance (S&P 500, XLE, VIX, Brent, WTI) and EIA (SPR levels, gas prices) every Saturday at 08:00 UTC. Each past week's forecast is scored against confirmed values.

Accuracy = 1 − abs(predicted − actual) / actual

Metrics with actual values below 0.5 in magnitude are excluded to avoid denominator distortion.

Accuracy dashboard at /forecast/accuracy.

Facility Damage Persistence Engine

The system maintains a persistent damage registry for 350+ strategic oil, LNG, refinery, and pipeline facilities globally. Unlike news scoring (which decays), damage records persist until recovery is confirmed.

Each record tracks:

Damage level: minor / moderate / severe
Production capacity offline (%)
Recovery timeline: low / mid / high estimate in weeks
Confidence score (0.0–1.0), decays −0.05 per 12h if unverified
Revision history with full audit trail
is_manual flag: manual operator overrides are never auto-modified

facility_score = (strategic_importance / 10) × (production_impact / 100) × 10 × time_decay × confidence
  where time_decay = max(0, 1 − days_elapsed / recovery_days) — decays to 0 as recovery completes

Multi-facility aggregation (diminishing weights):
  stack weights: [1.0, 0.7, 0.5, 0.3, 0.2, 0.1, …]
  infrastructure_val = min(∑(score₀ × 1.0 + score₁ × 0.7 + …), 10.0)

DISRUPTEDconfidence ≥ 0.65, production_impact_pct > 0

DEGRADEDconfidence 0.40–0.64, production_impact_pct > 0

AT RISKin Iran named threat list OR within 50km of FIRMS detection (48h)

OPERATIONALnone of the above

Multi-Agent Intelligence Pipeline

Multiple autonomous agents run continuously to maintain accuracy:

Damage Assessor — every 15 min

Monitors all events with escalation score ≥ 7. When a known facility name is matched via _match_facility() or FACILITY_ALIASES, fires a Groq LLM call to assess damage level, production impact, and recovery estimate. Creates or updates damage records automatically.

Verifier Agent — every 3 hours

Cross-checks existing automated damage records against live web search using Claude Sonnet (claude-sonnet-4-6). Synthesizes authoritative sources (Reuters, Bloomberg, official government statements). Auto-downgrades records where evidence contradicts the stored assessment. Auto-upgrade is never automatic — always flagged for human review.

Source quality tiers: high (Reuters, Bloomberg, official ministry) · medium (Al Jazeera, CNN, BBC) · low (Telegram, IRGC claims, Iranian state media)

GOR Discovery Agent — daily 08:00 UTC

Searches the live web for confirmed energy infrastructure damage events not yet in the damage registry. Uses Claude Haiku (claude-haiku-4-5-20251001) with web_search. Creates draft records (confidence ≤ 0.80, status="draft") for operator review. Drafts auto-expire after 48h if unconfirmed. Does not affect GOR index until confirmed.

FSI Discovery Agent — daily 10:00 UTC

Searches for current fuel reserve levels and shortage signals for all 22 FSI-tracked countries. Uses Claude Haiku with web_search. Writes to fsi_events and updates fsi_reserves when credible reserve figures are found (confidence ≥ 0.5). Redis dedup prevents duplicate events (25h TTL). Average 14 countries scanned per run, 28 web searches total.

Adversarial Surge Detector — at ingest time

If 3+ low-quality sources (Telegram channels, IRGC, Iranian state media) report the same facility + event type within one scoring cycle, confidence is capped at 0.35. Coordinated disinformation surges produce lower confidence, not higher.

Low-quality sources: presstv.ir, tasnimnews.com, irna.ir, and 4 Telegram OSINT channels

Global Supply Disruption Tracker

The /disruption page computes a real-time supply disruption composite across four commodity categories, updated every 5 minutes (Redis-cached for 300s):

Crude Production

44% of global tracked

45,000 kbd covered

LNG Export

77% of global tracked

310 Mt/year covered

Refining

18% of global tracked

18,000 kbd covered

Crude Export

69% of global tracked

38,000 kbd covered

per-commodity disrupted_pct = disrupted_capacity / total_capacity × 100
composite = coverage-weighted average across commodities
global_estimate = composite × average_coverage_pct

AT_RISK classification uses two layers: (1) Iran named threat list (SAMREF, Jubail Industrial City, Al Hosn, Mesaieed, Ras Laffan) and (2) FIRMS proximity — any facility within 50km of a NASA FIRMS fire detection in the last 48h (min_frp-filtered; industrial flaring excluded). Facilities with active damage records are classified DISRUPTED/DEGRADED, not AT_RISK.

Event Scoring Pipeline

Every ingested event passes through a multi-stage pipeline before contributing to the index:

1.IngestionEvent collected from source (RSS, API, WebSocket, scraper)

2.Keyword TriageMatched against 100+ Tier 1/2 keywords. Unmatched events are dropped.

3.Cross-Source DedupTrigram similarity (Jaccard >= 0.65) against 4h headline window in Redis

4.Deterministic CheckFIRMS, ACLED, USGS, NOAA scored by formula — bypasses LLM entirely

5.LLM ScoringRemaining events scored by Groq (llama) for escalation, disruption, actor capability

6.Index ComputationScored events feed into sub-indices every 5 minutes

Event Confidence Pipeline

Every event entering the NLP subindex is assigned a confidence stage that determines its weight in index computation. This separates speed of detection (preserved) from credibility of sustained elevation (enforced).

Stage	Trigger	NLP Weight
UNCONFIRMED	Single low/medium-tier source (Telegram, RSS, OSINT)	0.70×
CORROBORATED	Cross-signal detected within 90-min window	1.00×
VERIFIED	Verifier agent confirmed via authoritative source	1.15×
EXPIRED	90-min window closed, no corroboration received	0.30×

Corroboration signals — any one is sufficient

—Independent text source from a different source class covering the same facility or country, within the 90-minute window
—OVX volatility z-score spike ≥ 1.5σ within 60 minutes of event
—Hormuz transit count drop ≥ 1.5σ within 4 hours
—NASA FIRMS thermal detection within 50km of the named facility within 48 hours
—USGS M4+ seismic event within 30km of named facility within 2 hours (planned — requires geospatial enrichment)

Why 0.70× not 0.00×: OSINT sources, particularly Telegram channels, frequently detect real events 30–90 minutes before mainstream confirmation. Zeroing their contribution would sacrifice early warning capability. The 0.70× weight ensures the index moves on credible OSINT signals while preventing single unconfirmed sources from triggering regime escalation alone.

What users see: Event stage badges are visible on the live feed — UNCONFIRMED, CORROBORATED, VERIFIED, EXPIRED. Sustained elevation indicates corroboration; spike-and-retreat indicates single-source noise.

Data Sources (18 Active)

Source	Type	Schedule	Reliability	Scoring
GDELT	News	15 min	0.85	LLM
RSS (11 feeds)	News	10 min	0.55–0.95	LLM
NewsAPI	News	15 min	0.85	LLM
Telegram (11 ch)	OSINT	15 min	0.55–0.80	LLM
AIS (aisstream)	Maritime	5 min	0.99	FORMULA
yfinance	Market	5 min	0.90	FORMULA
NASA FIRMS	Satellite	1 hour	0.99	FORMULA
NOAA NHC	Weather	1 hour	0.99	FORMULA
USGS	Seismic	1 hour	0.99	FORMULA
IAEA	Official	2 hours	0.95	LLM
EIA	Official	6 hours	0.99	FORMULA
EIA (expanded)	Official	6 hours	0.99	FORMULA
OpenSanctions	Regulatory	6 hours	0.95	FORMULA
ACLED	Conflict	6 hours	0.90	FORMULA
UN ReliefWeb	Official	6 hours	0.90	FORMULA
Groq Vision	Satellite/Media	15 min	0.75	LLM
Verifier Agent	Intelligence	3 hours	0.90	LLM
GOR Discovery Agent	Intelligence	Daily	0.80	LLM
FSI Discovery Agent	Intelligence	Daily	0.80	LLM

FORMULA-scored sources bypass the LLM entirely, reducing API costs by ~60%. All sources are free-tier with no API keys required (except ACLED and NewsAPI).

LLM Budget Optimization

The system uses Groq free tier (llama-3.3-70b-versatile (Groq)) with a daily limit of 14,400 requests. Three mechanisms minimize LLM usage:

Deterministic Scoring

FIRMS, ACLED, USGS, NOAA, EIA, ReliefWeb, and OpenSanctions are scored by formula. Saves ~40–60% of LLM calls.

Cross-Source Dedup

Same event from Reuters + GDELT + NewsAPI is detected via trigram Jaccard similarity (≥0.65) against a 4-hour Redis window. Saves ~30–50%.

Keyword Triage

Events not matching any Tier 1/2 keyword are dropped before reaching the LLM. 100+ keywords across military, energy, sanctions, and shipping categories.

Verifier + Discovery (Claude Haiku)

Used for web search synthesis (Verifier every 3h on new records) and gap discovery (Discovery daily). ~$0.60/day. Auto-downgrades overstated damage from low-quality sources. Model: claude-haiku-4-5

Forecast (Claude Sonnet)

3 API calls/day for macro projections, one per scenario. ~$0.38/day. Energy prices are formula-driven and don't consume LLM tokens. Model: claude-sonnet-4-6

Intelligence Briefs (Groq)

Hourly analyst-style prose briefings generated from live scored events. Uses Groq free tier. Title on line 1, 4 flowing paragraphs. Skips gracefully on 429 rate-limit. Model: llama-3.3-70b-versatile

Keyword Taxonomy

Events are filtered and boosted using a two-tier keyword taxonomy:

TIER 1 — Direct Supply Threats (3.0× multiplier)

Military operations: airstrikes, missile attacks, naval blockade
Infrastructure: pipeline sabotage, refinery explosion, port closure
Maritime: tanker seizure, mine threat, Hormuz closure
Sanctions: oil embargo, SWIFT exclusion, dark fleet
Nuclear: weapons-grade, breakout capability

TIER 2 — Escalation Indicators (1.5× multiplier)

Diplomatic: sanctions threat, JCPOA withdrawal, normalization collapse
Military posturing: troop buildup, naval deployment, IRGC
Economic: production cut, OPEC+ quota, SPR release
Shipping stress: war risk insurance, freight premium, dark fleet
Political: regime change, coup, Islamic Resistance in Iraq

GOR System — Geopolitical Oil Risk Quantification Engine

All data sources are public domain or free-tier. No classified or proprietary intelligence is used. — March 2026