| 1 | # Hermes Pivot - Wider-Edge Market Selection (C1 spec) |
| 2 | |
| 3 | **Date:** 2026-05-18 |
| 4 | **Status:** Design spec for review. Implemented behind SHADOW MODE only (no capital). Auto remains OFF. |
| 5 | **Premise (from INVESTIGATION.md):** single-degree (~2°F) brackets have no edge - hit ~45%, reward:risk 0.51, break-even WR ~66%. The Gaussian model is *reliable* where the signal is strong; it only fails on sub-resolution brackets. The pivot trades **only where the model is actually trustworthy** and proves it on shadow data before risking money. |
| 6 | |
| 7 | --- |
| 8 | |
| 9 | ## Where edge actually exists (kalshi-quant playbook) |
| 10 | |
| 11 | 1. **Above/below-threshold markets with a large forecast-vs-threshold gap.** If NWS forecasts 88°F and the market is "85°+ ?", with day-1 MAE ~2.5°F that's a ~Hermes% YES - and retail frequently misprices the tails. This is the highest-confidence zone. Currently **disabled** (`ENABLE_THRESHOLD_MARKETS=False`) because the *broken pre-Apr-14 model* lost on them - not because the market type is bad. |
| 12 | 2. **Wide brackets (≥ ~5°F span).** A 5°F bracket vs ~2.5-3°F forecast σ has real, well-estimated probability mass. The Gaussian works here; it only breaks on 1-2°F brackets below forecast resolution. |
| 13 | |
| 14 | ## Selection rules (the spec) |
| 15 | |
| 16 | | Rule | Value | Rationale | |
| 17 | |---|---|---| |
| 18 | | Threshold markets | **ENABLE**, but trade only if `\|nws_forecast − threshold\| ≥ GAP_K × horizon_city_mae` | Only act when the forecast is confidently on one side (genuine edge). Inside that band = noise, skip. | |
| 19 | | `GAP_K` | **1.5** (tunable from shadow data) | ~1.5σ ≈ forecast clearly past the threshold. | |
| 20 | | Brackets | **width ≥ MIN_BRACKET_WIDTH** | Kill the no-edge narrow-bracket trap entirely. | |
| 21 | | `MIN_BRACKET_WIDTH` | **5.0°F** | Below this the Gaussian + market efficiency leave no edge (proven, 138-trade history). | |
| 22 | | MAE-σ floor | **KEEP (unchanged)** | It is what stopped the pre-Apr-21 bleed. Do NOT remove. | |
| 23 | | Cities | **all 5 enabled for shadow** | LA/Denver/Miami were disabled on broken-model data; re-collect cleanly. Re-judge per-city from shadow EV. | |
| 24 | | `low` markets | **enabled for shadow** | Same - disabled on broken data; re-evaluate. | |
| 25 | | MIN_EDGE pre-filter | **NOT applied in shadow** | Log every evaluated market with its computed edge so the optimal threshold is chosen from data, not guessed. | |
| 26 | |
| 27 | ## Shadow mode (C2 - safety-critical) |
| 28 | |
| 29 | - Scanner **runs** (scan → ensemble → probability → edge → would-be decision) even though `auto_config.enabled=0`. |
| 30 | - Every evaluated market is written to `predictions` + `market_history` (currently empty) with: ensemble prob, raw (pre-clamp) prob, model_count, NWS forecast, threshold/bracket, computed edge, would-be side, market price, timestamp. Resolution backfilled by the existing settle loop. |
| 31 | - **Hard guard at the lowest level:** `kalshi_place_order()` becomes a structural no-op when `SHADOW_MODE` is set - it cannot reach the Kalshi order API regardless of any flag, loop, or config. Defense in depth, not a single boolean. |
| 32 | - No Discord "traded" announcements in shadow; a periodic "shadow scan logged N markets" summary instead. |
| 33 | |
| 34 | ## ⚠️ Liquidity caveat (observed 2026-05-18, first shadow scans) |
| 35 | |
| 36 | Most markets the scanner logs are **px = $0.01** with the model claiming 0.18-0.64 "edge". These are deep-longshot / illiquid Kalshi markets - the optimism-tax trap (kalshi-quant: never buy <$0.10). The huge edges are almost certainly artifacts of model overconfidence on near-zero-price contracts, NOT real alpha. **C4 must hard-filter `px ≥ $0.10` and require non-trivial volume before computing any EV**, or it will "discover" a fake edge and repeat the whole bleed cycle. This is now the single biggest risk to the pivot evaluation. |
| 37 | |
| 38 | ## Decision gate (C4 - before any capital) |
| 39 | |
| 40 | After **≥30 resolved shadow predictions** on the new market set (post liquidity filter): |
| 41 | - Brier < 0.25 out-of-sample (real skill), AND |
| 42 | - Simulated EV/trade clearly positive after Kalshi fees at a defensible edge threshold, AND |
| 43 | - Holds within the highest-volume city/market-type subset (not one lucky cell). |
| 44 | |
| 45 | If it passes → propose (to user, explicitly) a small-size live pilot. If it fails → iterate rules or retire. **Never auto-enable.** |
| 46 | |
| 47 | ## Why this is the only honest path |
| 48 | |
| 49 | There is **zero historical data** on threshold/wide markets (the bot only ever traded narrow brackets; `market_history` is empty). The pivot therefore cannot be backtested - it must be forward-validated in shadow. This spec makes that collection safe and the success criteria explicit *before* collection starts, so the evaluation can't be rationalized after the fact. |