Zion Boggan zionboggan.com ↗
49 lines · markdown
History for this file →
1
# Hermes Pivot - Wider-Edge Market Selection (C1 spec)
2
 
3
**Date:** 2026-05-18
4
**Status:** Design spec for review. Implemented behind SHADOW MODE only (no capital). Auto remains OFF.
5
**Premise (from INVESTIGATION.md):** single-degree (~2°F) brackets have no edge - hit ~45%, reward:risk 0.51, break-even WR ~66%. The Gaussian model is *reliable* where the signal is strong; it only fails on sub-resolution brackets. The pivot trades **only where the model is actually trustworthy** and proves it on shadow data before risking money.
6
 
7
---
8
 
9
## Where edge actually exists (kalshi-quant playbook)
10
 
11
1. **Above/below-threshold markets with a large forecast-vs-threshold gap.** If NWS forecasts 88°F and the market is "85°+ ?", with day-1 MAE ~2.5°F that's a ~Hermes% YES - and retail frequently misprices the tails. This is the highest-confidence zone. Currently **disabled** (`ENABLE_THRESHOLD_MARKETS=False`) because the *broken pre-Apr-14 model* lost on them - not because the market type is bad.
12
2. **Wide brackets (≥ ~5°F span).** A 5°F bracket vs ~2.5-3°F forecast σ has real, well-estimated probability mass. The Gaussian works here; it only breaks on 1-2°F brackets below forecast resolution.
13
 
14
## Selection rules (the spec)
15
 
16
| Rule | Value | Rationale |
17
|---|---|---|
18
| Threshold markets | **ENABLE**, but trade only if `\|nws_forecast − threshold\| ≥ GAP_K × horizon_city_mae` | Only act when the forecast is confidently on one side (genuine edge). Inside that band = noise, skip. |
19
| `GAP_K` | **1.5** (tunable from shadow data) | ~1.5σ ≈ forecast clearly past the threshold. |
20
| Brackets | **width ≥ MIN_BRACKET_WIDTH** | Kill the no-edge narrow-bracket trap entirely. |
21
| `MIN_BRACKET_WIDTH` | **5.0°F** | Below this the Gaussian + market efficiency leave no edge (proven, 138-trade history). |
22
| MAE-σ floor | **KEEP (unchanged)** | It is what stopped the pre-Apr-21 bleed. Do NOT remove. |
23
| Cities | **all 5 enabled for shadow** | LA/Denver/Miami were disabled on broken-model data; re-collect cleanly. Re-judge per-city from shadow EV. |
24
| `low` markets | **enabled for shadow** | Same - disabled on broken data; re-evaluate. |
25
| MIN_EDGE pre-filter | **NOT applied in shadow** | Log every evaluated market with its computed edge so the optimal threshold is chosen from data, not guessed. |
26
 
27
## Shadow mode (C2 - safety-critical)
28
 
29
- Scanner **runs** (scan → ensemble → probability → edge → would-be decision) even though `auto_config.enabled=0`.
30
- Every evaluated market is written to `predictions` + `market_history` (currently empty) with: ensemble prob, raw (pre-clamp) prob, model_count, NWS forecast, threshold/bracket, computed edge, would-be side, market price, timestamp. Resolution backfilled by the existing settle loop.
31
- **Hard guard at the lowest level:** `kalshi_place_order()` becomes a structural no-op when `SHADOW_MODE` is set - it cannot reach the Kalshi order API regardless of any flag, loop, or config. Defense in depth, not a single boolean.
32
- No Discord "traded" announcements in shadow; a periodic "shadow scan logged N markets" summary instead.
33
 
34
## ⚠️ Liquidity caveat (observed 2026-05-18, first shadow scans)
35
 
36
Most markets the scanner logs are **px = $0.01** with the model claiming 0.18-0.64 "edge". These are deep-longshot / illiquid Kalshi markets - the optimism-tax trap (kalshi-quant: never buy <$0.10). The huge edges are almost certainly artifacts of model overconfidence on near-zero-price contracts, NOT real alpha. **C4 must hard-filter `px ≥ $0.10` and require non-trivial volume before computing any EV**, or it will "discover" a fake edge and repeat the whole bleed cycle. This is now the single biggest risk to the pivot evaluation.
37
 
38
## Decision gate (C4 - before any capital)
39
 
40
After **≥30 resolved shadow predictions** on the new market set (post liquidity filter):
41
- Brier < 0.25 out-of-sample (real skill), AND
42
- Simulated EV/trade clearly positive after Kalshi fees at a defensible edge threshold, AND
43
- Holds within the highest-volume city/market-type subset (not one lucky cell).
44
 
45
If it passes → propose (to user, explicitly) a small-size live pilot. If it fails → iterate rules or retire. **Never auto-enable.**
46
 
47
## Why this is the only honest path
48
 
49
There is **zero historical data** on threshold/wide markets (the bot only ever traded narrow brackets; `market_history` is empty). The pivot therefore cannot be backtested - it must be forward-validated in shadow. This spec makes that collection safe and the success criteria explicit *before* collection starts, so the evaluation can't be rationalized after the fact.