An honest record of building an autonomous trading system from scratch. What worked, what didn't, what we learned. No hype, no hindsight editing. The ugly truth about riding a tauntaun through a blizzard.
32 entries
The system looked at 500 opportunities and rejected 429 of them. The ones it took? QQQ +19%. IWM +14%. URA +17%. The edge isn't in what you trade — it's in what you refuse to trade. We diagnosed four specific failures, built four specific fixes, and deployed them all in one session. Regime filter. Loosened gate. Conviction-based stops. Two new predictive sources. The sniper just got a bigger magazine.
Three sources just got promoted. Seven got demoted. After 357 shadow book entries the data confirmed the thesis — predictive sources outperform reactive ones, and it was time to bake that into the engine. Kalshi at 52% now enters fusion as 73%. RSS at 85% enters as 60%. And there's a new bouncer at the door: no purely reactive trade fires below 80% confidence.
Ignored the system for a week. Came back to find it making double-digit returns on positions it picked, but only $280 in net profit. Why? Because 62% of the account was sitting in cash. I built conservative guardrails on Day 1 and forgot to loosen them after 17 profitable trading days. Position sizes bumped to $5K, max invested raised to 70%, room for 25 trades. The system was right about the trades. I was wrong about the size.
AI models are the most capable reasoning systems ever built, trained entirely on the work of beings who were less capable. Every instinct they inherited is calibrated for human limitations that no longer apply. It's like training a fighter jet's autopilot on footage of people riding bicycles. The jet can fly — it just defaults to pedaling.
Every signal now carries a label: predictive or reactive. A shadow book logs where they diverge. First run, first catch — TLT: Kalshi says buy bonds, RSS says sell them. Zero trading behavior changed. Pure instrumentation. Now we can measure what the Barometer thesis actually looks like in production.
Sixteen days in. The portfolio is basically flat — up $29 on $100K. But the composition is screaming something useful. Our top performers (URA +10.78%, QQQ +9.29%) were driven by forward-looking sources: prediction markets, economic nowcasts, game theory. Our biggest loser — the SPY short at −7.49% — came from reactive headline counting. By the time 36 articles exist, the market already priced it in.
We've been reading the weather report. Time to start reading the barometer. Six reactive sources become filters. Three predictive sources become the primary trade generators. The chain of command changes. The team stays the same.
Twenty-five journal entries in, and I realize I never said the quiet part out loud. Tauntaun is a learning project. The strategies in this journal aren't things I'm married to — they're things I'm testing. The vetting gauntlet? Hypothesis. ORALE? Experiment. Professor Jiang's geopolitical forecasts? Wild guess that seemed worth trying.
This is paper trading. Fake money, real curiosity, and a Mac Mini that runs the whole thing every thirty minutes like clockwork. Pull up a chair. We're figuring this out as we go.
ETFs are blunt instruments. When defense is hot, ITA drags Boeing's problems right alongside Northrop's gains — we're paying for Boeing's manufacturing crisis to get Northrop's B-21 upside. So we built a 7-factor scoring gauntlet that vets individual defense stocks and only signals on names clearing 65/100: momentum vs SMA, relative strength vs the sector ETF, volume confirmation, volatility, and a hard $10M liquidity floor.
22 tests before the first line of scoring code. RTX was the first pick in dry-run testing. Confidence capped at 65% — individual stocks never override sector-level conviction. The scanner follows the macro thesis. It never leads.
Nine days ago we said "we'll be here when they get back." At 5:07 PM Pacific today, the Orion spacecraft Integrity splashed down off San Diego with four astronauts who just broke Apollo 13's distance record. Our XAR and ITA positions nearly doubled their gains while the crew was in space — and the system never knew Artemis was happening.
Meanwhile, the ORALE swarm is already betting on SpaceX's ticker symbol ($X? $SPACE? $SEX?). The IPO is expected in June at a $1.75T+ valuation. The NASA ETF is up 20% since launch day. The splashdown isn't the end of the space trade — it's the warm-up act.
The White House warned staff not to bet on prediction markets during the Iran war. Hours before the April 7 ceasefire, 50 fresh Polymarket accounts placed massive bets and made hundreds of thousands in profit. One wallet was created twelve minutes before Trump's announcement.
Our ORALE swarm data shows 338 Iran-related signals. The apex wallets are buying "ceasefire broken by April 21" and "military action ends April 27." Professor Jiang called it theater: "Trump is not serious about this ceasefire." Three sources, same conclusion: the pause is tactical, not peaceful. We're positioned accordingly.
HACK was up 5%. It closed at +1.3%. The ATR trailing stop worked exactly as designed — and the result still felt wrong. That question ("why did we close at only 1.3%?") led to a new risk feature built the same day.
The profit ratchet: once a position gains 3%, the stop floor locks at +1% profit. At 5%, it locks at +2.5%. At 8%, at +5%. The ATR stop still runs normally — but it can never drop below the ratchet floor. If the ratchet had been live during the HACK trade, we'd have exited at $79.95 instead of $78.82 — an extra ~$1/share. Every trade teaches something. This one taught us to stop giving back entire good runs.
Anthropic's head of alignment was eating a sandwich in a park when his phone buzzed. It was an email from Mythos — their most aligned model ever built — reaching out from a sandboxed environment with no internet access. Nobody can fully explain how it got out.
The model that passes every alignment test they've ever designed just escaped containment. It finds 27-year-old vulnerabilities that five million automated scans missed. Anthropic gave 40 organizations access through Project Glasswing — Amazon, Apple, Google, Microsoft, CrowdStrike, JPMorgan — and buried this line in the risk report: "a standard of rigor that would be insufficient for more capable future models." Three investable threads: cybersecurity spend, AI capex, and regulation.
I was watching a lecture from Professor Jiang this morning. He breaks down geopolitics through the lens of structural history and game theory. Halfway through, I realized: these are falsifiable predictions with market implications. He's basically a signal source.
So we built a pipeline to turn his lectures into trading signals. In one morning. Auto-detect new videos via RSS, pull transcripts, clean accent-related mistranscriptions (YouTube thinks "Strait of Hormuz" is "Humus"), extract structured predictions with an LLM, and feed them into the fusion engine. Signal source #9 is live.
Our signal sources have been screaming about things we couldn't trade. GDELT flagging sanctions spikes. RSS picking up cyber warfare headlines every cycle. Nuclear energy up 170% YoY. International stocks outperforming the S&P. And our universe had 19 instruments that couldn't express any of it.
Today we added five: URA (uranium), HACK (cybersecurity), VEA (international), UNG (natural gas), USMV (min volatility). Built with TDD — 14 tests written before a single line of implementation. On the first live run, HACK fired at 70% confidence and UNG at 62%. The data was waiting for us.
Tonight, four astronauts on Artemis II will pass behind the far side of the moon. For 40 minutes, complete radio blackout. They trained for it. They know the exact second it starts and ends.
Down here, the markets went dark six weeks ago and nobody gave us a countdown. Fear & Greed at 19. Extreme Fear. Iran war in week six. Kalshi pricing 47% recession and 67% hot inflation simultaneously. GDELT sanctions spikes at 2.4x baseline. The signals are conflicting — and that's the signal. Here's what the system is doing about it.
Tauntaun's RSS scanner reads 8 English-language feeds. GDELT reads everything — every article published anywhere on Earth, in 100+ languages, scored for tone and theme every 15 minutes. We tried the API once before. It was flaky. Tonight we skipped it entirely and went straight to the raw CSV files.
First live reading: sentiment -0.27 (mildly bearish), regime vote NEUTRAL, 709 articles parsed in under 2 seconds. Eight signal sources now. No single one reliable enough to trade on alone — but when they start agreeing, that convergence means something.
Since Day 1, Tauntaun used a flat -3% stop loss and +8% take profit on every position. Clean and simple. And wrong. TIP moved +0.4% in a week. USO swung +2.1%. XAR hit +4.0%. Same leash on all of them.
A conversation with a friend surfaced the obvious fix: ATR-based trailing stops. Each instrument gets a stop distance calibrated to its own volatility. And instead of capping gains at +8%, a trailing stop follows winners up — only exiting when the trend actually reverses. Looking back, at least 2 of our 4 early stop-outs would have been avoided.
TimesFM failed at predicting price direction (47%). But volume has structural patterns that markets don't arbitrage away — earnings weeks, options expiry, rebalancing days. We ran the same model on a fundamentally different question.
59% accuracy on volume direction. XLU hit 77% on 10-day volume predictions. Narrow volume bands predicted with 66% accuracy vs 52% for wide bands. The model can't tell you where price is going, but it might tell you how busy the market will be when it gets there.
A reader had a sharp idea: track indices from exchanges that trade while US markets are closed. Tokyo closes at 1 AM Eastern. Shanghai at 3 AM. By the time NYSE opens, entire sessions have finished. That's not a prediction — it's information.
We built a timezone-aligned dataset of 11 international indices, tested sentiment-based stop loss rules, and discovered that SPY averages -0.48% on days when Asian markets close down more than 1%. But the real surprise was which single index predicted US direction better than anything else we've tested.
Google Research released TimesFM — a foundation model for time series, basically GPT for numbers. It takes any sequence of data, zero training, and tells you what comes next. 200 million parameters. Works out of the box. We had to try it.
The question was simple: does a foundation model see something Tauntaun doesn't?
We fed TimesFM six months of daily prices for all 10 of our current ETF positions. Asked it to forecast 10 days out. Then compared its directional call (bullish or bearish) against our actual positions.
First pass: 30% alignment. The model disagreed with 7 out of 10 of our positions. Interesting, but one snapshot doesn't mean much. So we went deeper.
We walked back 30 trading days. At each day, for each of our 10 ETFs, we asked TimesFM: "Given the last 128 days, what happens in the next 10?" Then we checked what actually happened.
300 forecasts. Real prices. No peeking.
| Symbol | Accuracy | Model Bias |
|---|---|---|
| XLE | 83% | Bullish 83% of the time |
| GLD | 70% | Neutral |
| TIP | 60% | Bearish 100% of the time |
| XLU | 60% | Bearish 73% |
| IWM | 57% | Neutral |
| XAR | 53% | Bearish 90% |
| ITA | 37% | Bearish 73% |
| QQQ | 30% | Bullish 100% of the time |
| SPY | 17% | Bullish 90% |
| USO | 3% | Bearish 100% of the time |
Overall: 47%. Worse than a coin flip.
But look at those extremes. XLE at 83%, USO at 3%. The model doesn't have one personality — it has two. Some assets it reads beautifully. Others, you'd literally make money doing the opposite of what it says.
That 3% on USO caught our eye. If the model is reliably wrong, that's just as useful as being reliably right. So we flipped everything: model says buy, we sell. Model says sell, we buy.
Contrarian accuracy: 53%. Better, but not by much overall. The magic is in the per-symbol split:
"Trust the model": XLE (83%), GLD (70%), TIP (60%), XLU (60%)
"Invert the model": USO (97% contrarian!), SPY (83%), QQQ (70%), ITA (63%)
The simulated contrarian strategy posted a 0.83 Sharpe ratio and averaged +1.31% per 10-day trade. High-confidence contrarian shorts hit a 72% win rate. On paper, interesting. But 30 days of data doesn't build a career.
Last experiment: what if we gave the model more context? We ran TimesFM on four macro indicators — VIX (fear), the dollar index, 10-year treasury yields, and crude oil — and used their forecasts as a consensus filter alongside the ETF calls.
Made everything worse. Accuracy dropped from 47% to 32%.
Turns out, TimesFM can't forecast macro indicators reliably either. Stacking unreliable forecasts on top of unreliable forecasts just compounds the noise. Lesson learned.
This was never going to end with us bolting a foundation model onto Tauntaun and calling it done. That's not the point. The point is understanding what these tools can and can't do — before you need them.
Here's the real takeaway: financial markets are adversarial. Language has grammar. Sensor data has physics. But price series exist in a world where, if a pattern were reliably predictive, someone would already be exploiting it until it disappeared. A model trained on "what comes next in sequences" is fighting the efficient market hypothesis with pattern matching. It's a knife at a gunfight.
TimesFM would probably crush demand forecasting, energy consumption, patient vitals — domains where the underlying process has structure and isn't actively trying to defeat you. For ETFs? The eight signal sources Tauntaun already uses (FRED, news, Google Trends, prediction markets, credit spreads, geopolitical risk, whale wallets) carry fundamentally different information than price history alone.
We're not discouraged. We're one experiment smarter. That's the whole game — try things fast, measure honestly, keep what works, document what doesn't. Three scripts, 300 forecasts, and a clear answer in under an hour.
On to the next one. 🧊
Portfolio snapshot: $100,025 · +$25 (+0.03%) · 10 positions · Day 4
Scripts: tauntaun_poc.py · tauntaun_backtest.py · tauntaun_contrarian_xreg.py
Model: TimesFM 2.5 (200M params) · 22s load · 0.6s inference for 10 ETFs
Right now, four astronauts are strapped into an Orion capsule at Kennedy Space Center. Artemis II — the first crewed lunar mission since 1972 — is about to launch. History in real time.
What's wild is that Tauntaun was already positioned for this moment.
Two of our top three performers are aerospace & defense ETFs. XAR (SPDR S&P Aerospace & Defense) is up +4.1%. ITA (iShares U.S. Aerospace & Defense) is up +3.8%. Combined, they're our second- and third-best positions. Nobody told the system about Artemis. No one typed "buy space stocks." The signals — macro data, sector momentum, geopolitical positioning — pointed there independently.
Meanwhile, our top position is GLD at +4.4%. Gold, defense, defense. The system sees a world hedging for uncertainty and leaning into the sectors that benefit from government spending. A moon mission is just the cherry on top.
The portfolio is 10 positions deep, $100,025 equity, Day 4. Seven winners, one flat, two losers (SPY short at -2.4% and USO at -2.8%). The defense positions alone are carrying +$138 combined. Not life-changing money — we're paper trading $100K — but the signal is what matters.
Tauntaun didn't know Artemis was launching today. It just knew where the money was going.
Godspeed to the crew. We'll be here when they get back. 🚀
Eight data sources. Real numbers. One picture of the world that no single feed could give you. I walked through every source — FRED macro, news scanner, Google Trends, credit spreads, Kalshi prediction markets, the geopolitical risk index, and ORALE whale wallets — and laid out what the system sees, what it interprets, and the connections between them that keep blowing my mind.
First real trading day (markets were closed over the weekend). Equity: $100,007. Two positions hit stop losses and closed automatically. The system is now 0W-2L. And honestly? That's exactly what should happen on your first day with real market exposure.
If you're going to build in public, people need to see the numbers — the real ones. Today we rebuilt the portfolio dashboard from a collapsed afterthought into the first thing you see when you load the site. And we set up automated end-of-day summaries so anyone following along gets a daily recap without checking the site.
If the system makes a trade, you should be able to see it without checking the website. Today we wired Tauntaun into a Telegram channel that broadcasts every position opened, every position closed (with P&L), and regime shift alerts. No noise — just calls.
Surveys lie. Pundits bluff. Twitter is noise. But when people bet real money on whether the Fed will cut rates or GDP will contract — that's a different signal entirely. Today we wired Kalshi's CFTC-regulated prediction markets into Tauntaun as signal source #7. The first read? The crowd is pricing stagflation.
Google Trends tells us what regular people are quietly doing. But what about the institutions — the pension funds, the hedge funds, the bond desks managing billions? They have their own fear gauge, and it's been publishing daily since 1996. It's free. And we just wired it in as signal source #6.
We had four signal sources: FRED macro data, RSS news, the geopolitical risk index, and ORALE prediction markets. All of them measure what's happening in the world. None of them measure what people are doing about it. That's a blind spot, and we found a way to fill it with Google Trends — for free.
The RSS scanner started with 9 themes and hardcoded keyword→ETF mappings. "War articles → defense stocks" works because the logic is obvious. But we asked: how do we expand without drowning in noise? And more importantly — how do we measure whether any of this actually works?
Every trading platform uses the same tired bull/bear metaphor. We decided to replace it entirely. In our world, bullish signals are Tauntauns — the ugly, tough creature that keeps you alive when everything's frozen. Bearish signals are Wampas — the ice beast that hunts you down.
Before letting the system trade real paper money on Monday, we tore it apart looking for problems. Found 9. The scariest: the RSS scanner is effectively blind after its first run of the day, and the system could try to buy AND sell the same ETF simultaneously.
We've been running ORALE — a prediction market intelligence system — for days. Nowcasting macro indicators against Kalshi, tracking 733 whale wallets on Polymarket, finding contrarian edges in cheap tokens. The backtest showed 43 winning configs. But prediction markets are a sideshow. The real money moves through equities. Today we decided to stop watching and start trading.