Why We Needed a Global Sentiment Layer
Tauntaun had 7 signal sources running: FRED macro data, RSS news scanning, geopolitical risk index, credit spreads, Kalshi prediction markets, Google Trends behavioral data, and ORALE prediction market intelligence. That's a solid stack. But every single one of them has the same blind spot: they're all English-centric.
When a trade war escalates, the first signals often appear in Chinese-language financial media. When an oil supply disruption hits, Arabic and Russian outlets report it hours before Reuters picks it up. When a semiconductor shortage develops, the earliest indicators are in Mandarin and Korean industry publications.
Our RSS scanner reads 8 curated English feeds. That's a keyhole view of a global information landscape. We needed something wider.
Enter GDELT
The GDELT Project (Global Database of Events, Language, and Tone) is one of those datasets that sounds too good to be true. It monitors news media worldwide in 100+ languages, processes every article through natural language analysis, and publishes the results every 15 minutes as downloadable files. Completely free. No API key. No rate limits on downloads.
For each article, GDELT extracts:
Themes — tagged from a taxonomy of thousands: ECON_RECESSION, MILITARY_CONFLICT, TRADE_DISPUTE, ECON_INFLATION, etc. We care about 9 economic and geopolitical themes.
Tone — a sentiment score from roughly -10 (extremely negative) to +10 (extremely positive). This isn't keyword matching. It's a calibrated sentiment analysis across the full article text.
Every 15 minutes, GDELT publishes a Global Knowledge Graph (GKG) file containing every article processed in that window. Typically 500-1,000 articles per dump. We download one file, parse it, and extract two signals.
The API That Wasn't
We actually built a GDELT integration once before — back on Day 1, using the GDELT DOC API. It worked in testing. Then in production, it started timing out. 15-second requests hanging indefinitely. Responses coming back empty. The kind of flakiness that poisons a pipeline running every 30 minutes.
We disabled it and moved on. Seven sources would have to do.
Tonight we came back with a different approach: skip the API entirely. GDELT publishes a master file list at a fixed URL. That file tells you exactly where the latest 15-minute GKG dump lives. Download the zip (~3MB), unzip it, parse the tab-delimited CSV. Pure stdlib Python — urllib, zipfile, csv. No external dependencies. No API that can go flaky.
First test: 709 articles parsed in under 2 seconds. No timeouts. No empty responses. The data was just sitting there the whole time.
What We Extract
Signal Type 1: Theme Spike Detection
For each of our 9 tracked themes (inflation, recession, unemployment, trade disputes, military conflict, sanctions, oil, interest rates, tax policy), we count how many articles appeared in the latest dump. We compare that to the rolling average from the last 14 days. If today's count is 2x or more the baseline, that's a spike — the world is suddenly talking about this topic more than normal.
Spikes trigger directional signals. A recession article spike → short SPY, long TLT, long GLD. A military conflict spike → long defense ETFs (ITA, XAR), long gold. The mapping is the same one we had before, just fed by data that actually arrives reliably.
Signal Type 2: Overall Economic Sentiment
We filter to articles tagged with any economic theme, then average their tone scores. That gives us a single number: the global economic mood right now, across all languages, all countries, all media.
We normalize it to a -1.0 to +1.0 scale:
Below -0.5 → extreme negative sentiment → risk-off signals (short SPY, long TLT, long GLD)
Below -0.25 → mildly negative → defensive lean (long staples)
Above +0.5 → strong positive → risk-on signals (long SPY, long QQQ)
Between -0.25 and +0.5 → neutral → no signal (this is correct behavior)
First live reading: -0.27 — mildly bearish. Generated one defensive signal. Sounds about right for April 2026.
The Regime Vote
This is the part that matters most for where we're headed. The GDELT module also exposes a get_regime_vote() function that returns one of three values: RISK_ON, RISK_OFF, or NEUTRAL.
Right now, it's based on the sentiment score and the number of active crisis themes spiking. But it's designed as a building block. When we build the regime engine — the system that asks "what kind of market are we in?" instead of "what should I trade?" — every signal source will cast a regime vote. GDELT will be one voice in that chorus.
For now, it voted NEUTRAL. Mildly bearish sentiment but no crisis themes spiking. That's a "nothing alarming" reading, which is exactly what you'd expect on a quiet Wednesday in April.
The Full Stack: Eight Sources
Here's what Tauntaun sees now, every 30 minutes:
| Source | What It Sees | Update Frequency |
|---|---|---|
| FRED Macro | 9 economic series — yield curve, unemployment claims, ISM, housing starts, etc. | Per pipeline run |
| RSS News | 8 curated feeds, 16 themes — earnings, Fed, trade, tech, defense | Per pipeline run |
| GPR Index | Caldara-Iacoviello Geopolitical Risk score (Fed economists) | Per pipeline run |
| Credit Spreads | ICE BofA High Yield OAS — institutional fear gauge, daily since 1996 | Per pipeline run |
| ORALE Bridge | 1,300+ apex wallets on Polymarket, prediction market consensus | Per pipeline run |
| Kalshi Markets | CFTC-regulated contracts — Fed rates, GDP, CPI probability distributions | 4h cached |
| Google Trends | Behavioral stress/euphoria — "withdraw 401k" vs "how to invest" search ratios | 20h cached (weekly data) |
| GDELT NEW | Global news sentiment + theme spikes across 100+ languages | 4h cached |
Eight independent views of the same world. Some measure what institutions are doing (credit spreads). Some measure what retail is feeling (Google Trends). Some measure what crowds are betting (Kalshi, ORALE). Some measure what's actually happening in the world (GDELT, RSS, GPR). And one measures the economic plumbing underneath all of it (FRED).
No single source is reliable enough to trade on alone. But when they start agreeing, that convergence means something.
What We Also Shipped Tonight
GDELT wasn't the only change. Earlier this evening, we replaced the flat risk management rules with ATR-based trailing stops — each instrument now gets a stop distance calibrated to its own volatility. That's documented in the previous entry.
Combined: Tauntaun is now seeing more of the world (GDELT) and managing risk more intelligently (per-instrument volatility stops). Two upgrades in one night. We'll see how they perform when markets open tomorrow.
The Lesson
Sometimes the data you need is sitting in a publicly available CSV file, updating every 15 minutes, and you've been trying to access it through a flaky API wrapper that obscures the simplicity of what's underneath. GDELT publishes raw files. We were using their search API. The files were always better.
When the API fails, go to the source.