The Eighth Source: What 100 Languages of News Tells You That Reuters Can't

Why We Needed a Global Sentiment Layer

Tauntaun had 7 signal sources running: FRED macro data, RSS news scanning, geopolitical risk index, credit spreads, Kalshi prediction markets, Google Trends behavioral data, and ORALE prediction market intelligence. That's a solid stack. But every single one of them has the same blind spot: they're all English-centric.

When a trade war escalates, the first signals often appear in Chinese-language financial media. When an oil supply disruption hits, Arabic and Russian outlets report it hours before Reuters picks it up. When a semiconductor shortage develops, the earliest indicators are in Mandarin and Korean industry publications.

Our RSS scanner reads 8 curated English feeds. That's a keyhole view of a global information landscape. We needed something wider.

Enter GDELT

The GDELT Project (Global Database of Events, Language, and Tone) is one of those datasets that sounds too good to be true. It monitors news media worldwide in 100+ languages, processes every article through natural language analysis, and publishes the results every 15 minutes as downloadable files. Completely free. No API key. No rate limits on downloads.

For each article, GDELT extracts:

Themes — tagged from a taxonomy of thousands: ECON_RECESSION, MILITARY_CONFLICT, TRADE_DISPUTE, ECON_INFLATION, etc. We care about 9 economic and geopolitical themes.

Tone — a sentiment score from roughly -10 (extremely negative) to +10 (extremely positive). This isn't keyword matching. It's a calibrated sentiment analysis across the full article text.

Every 15 minutes, GDELT publishes a Global Knowledge Graph (GKG) file containing every article processed in that window. Typically 500-1,000 articles per dump. We download one file, parse it, and extract two signals.

The API That Wasn't

We actually built a GDELT integration once before — back on Day 1, using the GDELT DOC API. It worked in testing. Then in production, it started timing out. 15-second requests hanging indefinitely. Responses coming back empty. The kind of flakiness that poisons a pipeline running every 30 minutes.

We disabled it and moved on. Seven sources would have to do.

Tonight we came back with a different approach: skip the API entirely. GDELT publishes a master file list at a fixed URL. That file tells you exactly where the latest 15-minute GKG dump lives. Download the zip (~3MB), unzip it, parse the tab-delimited CSV. Pure stdlib Python — urllib, zipfile, csv. No external dependencies. No API that can go flaky.

First test: 709 articles parsed in under 2 seconds. No timeouts. No empty responses. The data was just sitting there the whole time.

What We Extract

Signal Type 1: Theme Spike Detection

For each of our 9 tracked themes (inflation, recession, unemployment, trade disputes, military conflict, sanctions, oil, interest rates, tax policy), we count how many articles appeared in the latest dump. We compare that to the rolling average from the last 14 days. If today's count is 2x or more the baseline, that's a spike — the world is suddenly talking about this topic more than normal.

Spikes trigger directional signals. A recession article spike → short SPY, long TLT, long GLD. A military conflict spike → long defense ETFs (ITA, XAR), long gold. The mapping is the same one we had before, just fed by data that actually arrives reliably.

Signal Type 2: Overall Economic Sentiment

We filter to articles tagged with any economic theme, then average their tone scores. That gives us a single number: the global economic mood right now, across all languages, all countries, all media.

We normalize it to a -1.0 to +1.0 scale:

Below -0.5 → extreme negative sentiment → risk-off signals (short SPY, long TLT, long GLD)

Below -0.25 → mildly negative → defensive lean (long staples)

Above +0.5 → strong positive → risk-on signals (long SPY, long QQQ)

Between -0.25 and +0.5 → neutral → no signal (this is correct behavior)

First live reading: -0.27 — mildly bearish. Generated one defensive signal. Sounds about right for April 2026.

The Regime Vote

This is the part that matters most for where we're headed. The GDELT module also exposes a get_regime_vote() function that returns one of three values: RISK_ON, RISK_OFF, or NEUTRAL.

Right now, it's based on the sentiment score and the number of active crisis themes spiking. But it's designed as a building block. When we build the regime engine — the system that asks "what kind of market are we in?" instead of "what should I trade?" — every signal source will cast a regime vote. GDELT will be one voice in that chorus.

For now, it voted NEUTRAL. Mildly bearish sentiment but no crisis themes spiking. That's a "nothing alarming" reading, which is exactly what you'd expect on a quiet Wednesday in April.

The Full Stack: Eight Sources

Here's what Tauntaun sees now, every 30 minutes:

Source	What It Sees	Update Frequency
FRED Macro	9 economic series — yield curve, unemployment claims, ISM, housing starts, etc.	Per pipeline run
RSS News	8 curated feeds, 16 themes — earnings, Fed, trade, tech, defense	Per pipeline run
GPR Index	Caldara-Iacoviello Geopolitical Risk score (Fed economists)	Per pipeline run
Credit Spreads	ICE BofA High Yield OAS — institutional fear gauge, daily since 1996	Per pipeline run
ORALE Bridge	1,300+ apex wallets on Polymarket, prediction market consensus	Per pipeline run
Kalshi Markets	CFTC-regulated contracts — Fed rates, GDP, CPI probability distributions	4h cached
Google Trends	Behavioral stress/euphoria — "withdraw 401k" vs "how to invest" search ratios	20h cached (weekly data)
GDELT NEW	Global news sentiment + theme spikes across 100+ languages	4h cached

Eight independent views of the same world. Some measure what institutions are doing (credit spreads). Some measure what retail is feeling (Google Trends). Some measure what crowds are betting (Kalshi, ORALE). Some measure what's actually happening in the world (GDELT, RSS, GPR). And one measures the economic plumbing underneath all of it (FRED).

No single source is reliable enough to trade on alone. But when they start agreeing, that convergence means something.

What We Also Shipped Tonight

GDELT wasn't the only change. Earlier this evening, we replaced the flat risk management rules with ATR-based trailing stops — each instrument now gets a stop distance calibrated to its own volatility. That's documented in the previous entry.

Combined: Tauntaun is now seeing more of the world (GDELT) and managing risk more intelligently (per-instrument volatility stops). Two upgrades in one night. We'll see how they perform when markets open tomorrow.

The Lesson

Sometimes the data you need is sitting in a publicly available CSV file, updating every 15 minutes, and you've been trying to access it through a flaky API wrapper that obscures the simplicity of what's underneath. GDELT publishes raw files. We were using their search API. The files were always better.

When the API fails, go to the source.