Zion Boggan zionboggan.com ↗
98 lines · markdown
History for this file →
1
# Hermes Investigation - 2026-05-17
2
 
3
**Context:** "Feels like the data is corrupted - worked for a bit then slowly bled me." Operator-initiated audit after a sustained drawdown.
4
**Status:** Investigation only. No code changed. Auto-trading remains OFF (paused since 2026-05-14).
5
**Confidence:** High. The final conclusion is consistent across four independent cuts of the data and required no further reversal.
6
 
7
---
8
 
9
## Bottom line up front
10
 
11
1. **Your instinct was right that something was off - but it is not ongoing corruption or a dead pipeline.**
12
2. **The money was bled before 2026-04-21** by an overconfident probability model betting NO on coin-flip markets with bad payout math.
13
3. **The 2026-04-21 "MAE-σ floor" change stopped the bleed.** Trades since then are roughly breakeven. The −$157 total P&L and 48% drawdown you see are *old damage still showing on the cumulative chart*, not fresh losses.
14
4. **Single-degree (2°F) temperature brackets have no exploitable edge.** They hit ~45% of the time and the payout structure needs a ~66% win rate. No probability model can fix a market with no edge.
15
5. **A real bug exists but it never cost money:** three diagnostic columns are never written to the database. That logging gap caused *three separate audits* (2026-04-27, and my own first two passes today) to misdiagnose the problem. It is worth fixing for observability only.
16
 
17
---
18
 
19
## The numbers that settle it
20
 
21
### Era split (the decisive cut)
22
 
23
| Era | Trades | Avg ensemble prob | P&L | EV/trade |
24
|---|---|---|---|---|
25
| Pre-Apr-21 (unclamped Gaussian) | 97 | 0.01-0.14 | **−$160.72** | **−$1.66** |
26
| Post-Apr-21 (MAE-σ floor active) | 41 | ~0.10 | **+$3.06** | **≈ $0.00** |
27
 
28
Essentially **100% of the lifetime loss happened before 2026-04-21.** After the MAE-σ floor was added, the bot stopped bleeding.
29
 
30
### Why the strategy can't win (the payout math)
31
 
32
- Narrow-bracket actual hit rate: **62/138 = 44.9%** - these markets are near coin-flips.
33
- Realized reward:risk: avg win **$3.32**, avg loss **−$6.46** → ratio **0.51**.
34
- Break-even win rate at that ratio: **1 / (1 + 0.51) ≈ 66%**.
35
- Bot's actual win rate: **54.3%**.
36
 
37
You cannot make money betting NO on ~coin-flip events when the payout structure demands a 66% win rate. This is a **market-selection problem, not a model problem.** Single-degree temperature brackets sit below NWS/ensemble forecast resolution and Kalshi prices them efficiently.
38
 
39
### What the model actually did wrong
40
 
41
The Gaussian model (whether fed by NWS point forecast or the ensemble) was systematically **overconfident that the bracket would NOT hit** - it assigned 1-13% hit probability when reality was ~45%. Pre-Apr-21 this was unclamped, so it would say "1% chance" and bet NO at fake 99% confidence → catastrophic. The Apr-21 MAE-σ floor crudely clamped the floor to ~10%, which capped the overconfidence and stopped the catastrophic losses (but did not create a winning strategy - breakeven, not profit).
42
 
43
---
44
 
45
## The logging gap (real bug, zero P&L impact)
46
 
47
`INSERT INTO trades` at `main.py:1585` lists its columns explicitly. Three columns that exist in the schema are **not in the INSERT** and are therefore never written:
48
 
49
- `raw_ensemble_probability` → always NULL
50
- `model_count` → always DEFAULT 1
51
- `models_used` → always '' / NULL
52
 
53
**Consequence:** anyone (human or AI) inspecting the trade table sees `model_count = 1` and `raw_ensemble_probability = NULL` on every row and concludes "the 31-member ensemble never ran / the pipeline is dead."
54
 
55
This is false. Live testing on 2026-05-17 confirmed `fetch_ensemble_forecast()` returns a healthy 31-member ensemble (validate=True, ~1.7°F spread). The ensemble works. The columns are simply never recorded.
56
 
57
**This single gap caused three misdiagnoses:**
58
- 2026-04-27 audit: blamed a dead `OPENMETEO_PROXY` node, "fixed" it (the fix addressed a non-problem; that memory entry is now flagged invalid).
59
- This session, pass 1: I repeated the same "ensemble pipeline died" error.
60
- This session, pass 2: I then over-corrected to "the MAE-σ floor is destroying the signal and causing the bleed" - also wrong (the era split disproves it).
61
 
62
The honest record of that oscillation is preserved in memory. The final era-split reconciliation is internally consistent and required no further reversal.
63
 
64
---
65
 
66
## Evidence trail (for review)
67
 
68
1. `auto_config.enabled = 0` confirmed; last real trade 2026-05-13 (pre-pause). No trades since the 2026-05-14 fix. No ongoing bleeding.
69
2. 138 settled trades, all single-degree temp brackets, ~100% NO side.
70
3. Backtest by edge band and the rejected empirical model: **disregard these** - computed before the era split was understood; superseded.
71
4. Live pipeline tests: ensemble OK (31 members), NWS OK, `OPENMETEO_PROXY = None` (the old trap is not active).
72
5. Era-sliced P&L + clamp regime + ground-truth hit rate (the tables above): mutually consistent, no contradictions.
73
 
74
Supporting files: `empirical_analysis.py`, `h_ds.psv`.
75
 
76
---
77
 
78
## Options (no action taken - your call)
79
 
80
**A. Do nothing / stay paused (lowest risk).**
81
Hermes is paused and not losing money. The strategy has no edge; "don't trade" is the correct play for a no-edge market. Cost: $0. Benefit: $0.
82
 
83
**B. Fix the logging gap only.**
84
Add the 3 missing columns to the INSERT so future audits aren't blind. ~10-line change, no behavioral effect, auto stays OFF. Recommended regardless of strategy decision - it stops the recurring misdiagnosis.
85
 
86
**C. Pivot to markets that actually have edge (real project).**
87
Abandon single-degree brackets. Target wider (≥3-5°F) brackets and above/below-threshold markets where |forecast − threshold| is large vs forecast error - the zones where NWS genuinely beats retail. **No historical data exists for these** (the bot only ever traded narrow brackets, and `market_history` is empty), so this *cannot be backtested* - it requires a shadow-mode data-collection window before any capital. Largest effort, only path with a plausible edge.
88
 
89
**D. Retire Hermes.**
90
If the appetite for a multi-week rebuild isn't there, the rational move for a no-edge bot is to stop. Funds stay safe.
91
 
92
**Do NOT:** re-enable auto on bracket markets, or remove the MAE-σ floor. The floor is helping; removing it reproduces the pre-Apr-21 catastrophic bleed.
93
 
94
---
95
 
96
## Recommendation
97
 
98
**B now (cheap, stops the misdiagnosis loop), then a deliberate choice between A/C/D - not under time pressure.** The one thing the data is unambiguous about: the current strategy (single-degree brackets) has no edge and should never be re-enabled as-is. Whether to invest in pivot (C) or retire (D) is a question of how much you want to spend chasing a weather-trading edge that, per the quant playbook, exists only in market types Hermes has never actually traded.