docs/hermes-v4-research-findings-and-fixes.md · prediction-market-bot-postmortem

241 lines · markdown

# Hermes v4.0 - Research Findings & Pending Fixes
 
**Source:** `kalshi-weather-research.pdf` (compiled March 25, 2026)
**Date:** 2026-03-26 (updated 2026-04-03)
**Status:** Fixes 1-2-7 implemented. Bracket fix patch deployed 2026-04-01. Net EV filter deployed 2026-04-03.
 
---
 
## 2026-04-03 - Net EV Dust Trade Filter (Deployed)
 
**Root cause found:** Kelly sizing with dampeners can produce bets that buy only 1 contract
on high-priced markets (e.g. KXHIGHLAX-26APR03-T74: 1 contract @ $0.64 = $0.31 net EV).
These dust trades waste one of 8 daily trade slots for pennies of expected value.
 
**Fix deployed (CT-REDACTED + CT-REDACTED):**
 
| Fix | What | Why |
|-----|------|-----|
| `MIN_TRADE_EV_PCT = 0.0015` | New guardrail constant (0.15% of bankroll) | Scales threshold with account size |
| Net EV gate | `net_ev = contracts × (bet_prob - ask_price) - fees` | Filters on expected dollars, not arbitrary contract counts |
| Skip + log | Skipped trades logged with EV, threshold, contract count | Full audit trail in scan actions / Discord |
 
**Why net EV over simpler alternatives:**
- Min contracts floor (e.g. 3): crude, doesn't account for price differences
- Raised min bet ($2): ignores whether the trade actually generates value
- Net EV: directly measures expected dollars per trade slot, captures fee drag, scales with bankroll
 
**Thresholds at current balances:**
- Hermes ($348): min EV = $0.52/trade
- Hermes2 ($49): min EV = $0.07/trade
 
---
 
## 2026-04-01 - Bracket Fix Patch (Deployed)
 
**Root cause found:** `_ensemble_gaussian_bracket()` systematically underestimates bracket
probability when the ensemble has converged (sigma < 2°F). Outliers inflate Gaussian sigma,
spreading probability outside the bracket. Example: Chicago high ensemble converged to
40-41°F range, Gaussian said 31% bracket probability, raw count showed 74%. This inflated
NO edge above the 0.20 veto ceiling, causing risky bracket trades to bypass Sonnet review.
 
**4 fixes deployed (CT-REDACTED + CT-REDACTED):**
 
| Fix | What | Why |
|-----|------|-----|
| Hybrid bracket prob | `max(Gaussian, raw_count±0.5°F)` | Gaussian helps far-out; raw catches converged |
| Bracket veto trigger | Ensemble mean inside bracket → Sonnet review | Safety net for riskiest bracket scenario |
| Bet sizing hard cap | `min(bet, 8% × bankroll)` after Kelly | Prevents rounding overshoot |
| Raw prob column | `raw_ensemble_probability` in trades table | Audit: separate raw vs calibrated |
 
**Rejected after backtesting:**
- ±2°F NWS guard: Blocked 7 winners, 0 losses = -$16.19 net. NWS distance is NOT a predictor of bracket failure.
- METAR entry filter: Dead code. Trades placed 12-30h before observations become informative.
 
**Planned (weekend):** Bracket exit monitor - sells positions when 6 gates confirm edge flip.
 
**Key data points:**
- Historical bracket NO: 15W/6L, +$5.56 net, 71% win rate
- Losses cluster at NWS 3-10°F from bracket (big forecast busts), NOT near-bracket trades
- The live code already had a 50/50 blend + 5% floor approach - replaced with max() which is more accurate
 
---
 
## Fixes To Implement (Priority Order)
 
### FIX 1: Variable Fee Formula (CRITICAL - blocking profitable trades now)
**Current:** Flat `TAKER_FEE_PER_CONTRACT = 0.05` applied to all trades.
**Correct:** `fee = ceil(0.07 * contracts * price * (1 - price))`
 
| Contract Price | Our Fee (flat) | Actual Fee | We're Wrong By |
|---------------|---------------|------------|----------------|
| $0.85 | $0.05 | $0.01 | 5x too high |
| $0.75 | $0.05 | $0.02 | 2.5x too high |
| $0.60 | $0.05 | $0.02 | 2.5x too high |
| $0.50 | $0.05 | $0.02 | 2.5x too high |
| $0.40 | $0.05 | $0.02 | 2.5x too high |
 
**Impact:** We're rejecting trades with 7-9% true edge because our inflated fee estimate makes them look below the 10% threshold. This is the single biggest leak in the system right now.
 
**Implementation:**
```python
def kalshi_taker_fee(price, contracts=1):
    import math
    raw_fee = 0.07 * contracts * price * (1 - price)
    return math.ceil(raw_fee * 100) / 100
```
Replace the flat constant everywhere edge is calculated.
 
---
 
### FIX 2: Reduce Scan Interval from 30 Minutes to 5 Minutes
**Current:** Scanner runs every 30 minutes.
**Finding:** The competing bot (suislanchez, $1,325+ profit) scans every 5 minutes.
**Cost:** Zero. 1,440 API calls/day vs 10,000 limit.
**Benefit:** Catches mispricing faster, especially after GFS ensemble releases (data available ~3.5h after initialization at 00Z/06Z/12Z/18Z).
**Risk:** More Sonnet veto calls on Max plan. Mitigated by the filter pipeline - most markets get rejected before reaching Sonnet.
 
---
 
### FIX 3: Add Maker Orders for Better Fees
**Current:** All orders are taker (market) orders.
**Finding:** Maker fee is 25% of taker fee: `ceil(0.0175 * C * P * (1-P))`. At $0.50 contract: taker fee = $0.02, maker fee = $0.01.
**Implementation:** For trades where market is not about to close (>6h to settlement), place limit orders slightly inside the spread instead of taking the ask. Fall back to taker if not filled within 10 minutes.
**Complexity:** Medium - requires order monitoring and cancellation logic.
**Priority:** After fix 1 and 2 are validated.
 
---
 
### FIX 4: Extremized Aggregation (Replace Simple Calibration Multiply)
**Current:** `adj_prob = ens_prob * calibration_multiplier`
**Better:** Combine ensemble + NWS + base rates via log-odds with extremizing factor.
**Research:** Satopaa et al. (2014), Neyman & Roughgarden (2021) - optimal factor ~1.73 for robust aggregation.
**Implementation:**
```python
def extremize_aggregate(probabilities, weights=None, factor=1.5):
    import math
    if weights is None:
        weights = [1.0 / len(probabilities)] * len(probabilities)
    clamped = [max(0.001, min(0.999, p)) for p in probabilities]
    log_odds = [math.log(p / (1 - p)) for p in clamped]
    avg_lo = sum(w * lo for w, lo in zip(weights, log_odds))
    ext_lo = avg_lo * factor
    return 1 / (1 + math.exp(-ext_lo))
```
**Notes:**
- Start with factor 1.5 (conservative - ensemble members share model physics, high info overlap)
- Weights: 0.5 ensemble, 0.3 NWS, 0.2 historical base rate
- Factor for weather should be lower than geopolitical (1.73) because ensemble members aren't independent
**Priority:** After 30 trades validate the current system works.
 
---
 
### FIX 5: Rain Ensemble Bias Correction
**Finding:** GFS ensemble over-forecasts light precipitation (false alarm rate too high). Raw member counting overestimates "any rain" probability.
**Source:** Zhu & Luo (2015), "Precipitation Calibration Based on the Frequency-Matching Method"
**Implementation:** Maintain rolling 30-day comparison of ensemble rain probability vs observed rain for each city. Apply frequency-matching correction.
**Priority:** After accumulating 20+ KXRAIN settlements to establish baseline bias.
 
---
 
### FIX 6: City-Specific Low Temp Adjustments
**Finding:** Overnight lows are harder to forecast than highs due to radiative cooling, inversions, UHI effects.
**Risk ranking:**
| City | Low Temp Risk | Reason |
|------|-------------|--------|
| Denver | HIGHEST | Altitude + inversions + dry air + DEN airport 24mi from downtown |
| Chicago | HIGH | Lake effect + continental + inversion potential |
| NYC | MEDIUM | UHI 8F+, airport vs Manhattan can differ 5F+ at night |
| LA | MEDIUM | LAX coastal vs inland can differ 10-15F on summer nights |
| Miami | LOWEST | Tropical maritime limits radiative cooling |
 
**Implementation:** Apply higher minimum edge for KXLOWT than KXHIGH. Possible: 12% for low vs 10% for high, with Denver KXLOWT at 15%.
**Priority:** Can implement now as a constant, tune after data.
 
---
 
### FIX 7: Reduce Edge Threshold from 10% to 8%
**Finding:** The suislanchez bot uses 8% edge threshold and is profitable. With the variable fee fix (Fix 1), our true edge calculation becomes more accurate, so a lower threshold is justified.
**Current:** `MIN_EDGE = 0.10`
**Proposed:** `MIN_EDGE = 0.08` (matches competitor)
**Caveat:** Only after Fix 1 (variable fees) is implemented. With the flat $0.05 fee, 8% would let in bad trades.
**Priority:** Implement together with Fix 1.
 
---
 
### FIX 8: GFS Ensemble Release-Aware Scanning
**Finding:** GFS ensemble data becomes available ~3.5h after initialization:
| Run | Init (UTC) | Data Available | CDT |
|-----|-----------|----------------|-----|
| 00Z | 00:00 | ~03:30 UTC | 10:30 PM |
| 06Z | 06:00 | ~09:30 UTC | 4:30 AM |
| 12Z | 12:00 | ~15:30 UTC | 10:30 AM |
| 18Z | 18:00 | ~21:30 UTC | 4:30 PM |
 
**Implementation:** After switching to 5-minute scans, no special timing needed - the bot naturally picks up new data. But could log which GFS run the ensemble came from for calibration purposes.
**Priority:** Low - 5-minute scans handle this implicitly.
 
---
 
## Research Findings (Reference - No Code Changes Needed)
 
### FINDING 1: Kalshi Balance Earns 3.75-4% APY
Kalshi pays yield on total account balance. At $60 this is negligible ($2.25/year), but at $500+ it becomes a consideration - idle cash isn't fully idle.
 
### FINDING 2: Maker Fee History - Rounding Exploit Was Real
Before July 2025, maker fees were flat $0.0025/contract. On $0.02 contracts, this rounded to $0.01 - a 50% effective fee. Kalshi fixed this. Current variable formula eliminates the rounding issue.
 
### FINDING 3: Post-2024 Kalshi Regime Change
Before 2024 Q4, takers made money on average. After Kalshi's legal victory and volume explosion ($30M to $820M/quarter), professional market makers entered. Takers now lose on average. Our edge MUST come from better information (ensemble data), not from market structure.
 
### FINDING 4: Weather is in the "Other" Category at ~10% of Volume
Weather/climate is Kalshi's original niche but only ~10% of total notional volume. Lower volume = potentially wider spreads but also less competition from sophisticated market makers.
 
### FINDING 5: Longshot Bias Confirmed with Kalshi Data
72.1M trade analysis confirms: contracts below 10% implied probability consistently underperform for buyers. Our $0.40 price floor already exploits this by forcing us to trade in the 40-99 cent range where mispricing exists without the longshot trap.
 
### FINDING 6: Becker Dataset Available for Backtesting
Full Parquet dataset at github.com/Jon-Becker/prediction-market-analysis. Could filter to weather tickers and compute actual historical mispricing, time-of-day effects, and pre/post ensemble release patterns.
 
### FINDING 7: NBM May Be Superior to Raw GFS Ensemble
The National Blend of Models (NBM v4.3) already applies bias correction + quantile mapping to GFS/GEFS/HRRR/ECMWF. Could be used as a third probability source alongside ensemble and NWS for extremized aggregation.
 
### FINDING 8: Fan & van den Dool (2011) Key Results
- GFS ensemble 2m temp errors are dominated by large-scale spatial patterns
- 30-day mean forecast errors produce more robust bias corrections than 7-day means
- Cold season shows more removable bias than warm season
- ~60% of total error variance captured by leading EOF modes
 
### FINDING 9: UHI Effect Is Larger at Night
Urban Heat Island effect is 2-5F warmer at night (more than daytime 1-7F). Since Kalshi settles on airport METAR stations (often outside UHI), the model grid cell may include urban warming the airport doesn't see. This creates a systematic warm bias in low temp ensemble forecasts for urban stations.
 
### FINDING 10: GEFS Reforecast Data Available on AWS
`s3://noaa-gefs-retrospective/GEFSv12/reforecast/` - could build city-specific MAE/bias tables from 2000-present instead of waiting for live trade data. 11 ensemble members (vs 31 operational) but large historical sample.
 
---
 
## Open-Source Bot Comparison
 
| Aspect | suislanchez Bot | Hermes v4 |
|--------|----------------|-----------|
| Profit | $1,325+ confirmed | $0 (just deployed) |
| Edge threshold | 8% | 10% (should lower to 8%) |
| Scan interval | 5 minutes | 30 minutes (should lower to 5) |
| Kelly fraction | 15% (0.15x) | Variable (0.125-0.375x by confidence) |
| Markets | KXHIGH only | KXHIGH + KXLOWT + KXRAIN |
| Fee model | Unknown | Variable formula (pending fix) |
| NWS cross-check | No | Yes (high, low, rain) |
| Sonnet veto gate | No | Yes |
| Correlation guard | No | Yes (2 per city/date) |
| Calibration | Brier only | Per-type per-city per-season |
 
---
 
## Implementation Order
 
| Phase | Fixes | When |
|-------|-------|------|
| **Now** | Fix 1 (variable fees) + Fix 7 (lower threshold to 8%) | Immediate |
| **This week** | Fix 2 (5-min scans) + Fix 6 (city-specific low temp adjustments) | After Fix 1 validated |
| **After 30 trades** | Fix 4 (extremized aggregation) + Fix 5 (rain bias correction) | Need data first |
| **After 50 trades** | Fix 3 (maker orders) + Fix 8 (release-aware logging) | Optimization phase |

1	# Hermes v4.0 - Research Findings & Pending Fixes
2
3	Source: `kalshi-weather-research.pdf` (compiled March 25, 2026)
4	Date: 2026-03-26 (updated 2026-04-03)
5	Status: Fixes 1-2-7 implemented. Bracket fix patch deployed 2026-04-01. Net EV filter deployed 2026-04-03.
6
7	---
8
9	## 2026-04-03 - Net EV Dust Trade Filter (Deployed)
10
11	Root cause found: Kelly sizing with dampeners can produce bets that buy only 1 contract
12	on high-priced markets (e.g. KXHIGHLAX-26APR03-T74: 1 contract @ $0.64 = $0.31 net EV).
13	These dust trades waste one of 8 daily trade slots for pennies of expected value.
14
15	Fix deployed (CT-REDACTED + CT-REDACTED):
16
17	\| Fix \| What \| Why \|
18	\|-----\|------\|-----\|
19	\| `MIN_TRADE_EV_PCT = 0.0015` \| New guardrail constant (0.15% of bankroll) \| Scales threshold with account size \|
20	\| Net EV gate \| `net_ev = contracts × (bet_prob - ask_price) - fees` \| Filters on expected dollars, not arbitrary contract counts \|
21	\| Skip + log \| Skipped trades logged with EV, threshold, contract count \| Full audit trail in scan actions / Discord \|
22
23	Why net EV over simpler alternatives:
24	- Min contracts floor (e.g. 3): crude, doesn't account for price differences
25	- Raised min bet ($2): ignores whether the trade actually generates value
26	- Net EV: directly measures expected dollars per trade slot, captures fee drag, scales with bankroll
27
28	Thresholds at current balances:
29	- Hermes ($348): min EV = $0.52/trade
30	- Hermes2 ($49): min EV = $0.07/trade
31
32	---
33
34	## 2026-04-01 - Bracket Fix Patch (Deployed)
35
36	Root cause found: `_ensemble_gaussian_bracket()` systematically underestimates bracket
37	probability when the ensemble has converged (sigma < 2°F). Outliers inflate Gaussian sigma,
38	spreading probability outside the bracket. Example: Chicago high ensemble converged to
39	40-41°F range, Gaussian said 31% bracket probability, raw count showed 74%. This inflated
40	NO edge above the 0.20 veto ceiling, causing risky bracket trades to bypass Sonnet review.
41
42	4 fixes deployed (CT-REDACTED + CT-REDACTED):
43
44	\| Fix \| What \| Why \|
45	\|-----\|------\|-----\|
46	\| Hybrid bracket prob \| `max(Gaussian, raw_count±0.5°F)` \| Gaussian helps far-out; raw catches converged \|
47	\| Bracket veto trigger \| Ensemble mean inside bracket → Sonnet review \| Safety net for riskiest bracket scenario \|
48	\| Bet sizing hard cap \| `min(bet, 8% × bankroll)` after Kelly \| Prevents rounding overshoot \|
49	\| Raw prob column \| `raw_ensemble_probability` in trades table \| Audit: separate raw vs calibrated \|
50
51	Rejected after backtesting:
52	- ±2°F NWS guard: Blocked 7 winners, 0 losses = -$16.19 net. NWS distance is NOT a predictor of bracket failure.
53	- METAR entry filter: Dead code. Trades placed 12-30h before observations become informative.
54
55	Planned (weekend): Bracket exit monitor - sells positions when 6 gates confirm edge flip.
56
57	Key data points:
58	- Historical bracket NO: 15W/6L, +$5.56 net, 71% win rate
59	- Losses cluster at NWS 3-10°F from bracket (big forecast busts), NOT near-bracket trades
60	- The live code already had a 50/50 blend + 5% floor approach - replaced with max() which is more accurate
61
62	---
63
64	## Fixes To Implement (Priority Order)
65
66	### FIX 1: Variable Fee Formula (CRITICAL - blocking profitable trades now)
67	Current: Flat `TAKER_FEE_PER_CONTRACT = 0.05` applied to all trades.
68	Correct: `fee = ceil(0.07 * contracts * price * (1 - price))`
69
70	\| Contract Price \| Our Fee (flat) \| Actual Fee \| We're Wrong By \|
71	\|---------------\|---------------\|------------\|----------------\|
72	\| $0.85 \| $0.05 \| $0.01 \| 5x too high \|
73	\| $0.75 \| $0.05 \| $0.02 \| 2.5x too high \|
74	\| $0.60 \| $0.05 \| $0.02 \| 2.5x too high \|
75	\| $0.50 \| $0.05 \| $0.02 \| 2.5x too high \|
76	\| $0.40 \| $0.05 \| $0.02 \| 2.5x too high \|
77
78	Impact: We're rejecting trades with 7-9% true edge because our inflated fee estimate makes them look below the 10% threshold. This is the single biggest leak in the system right now.
79
80	Implementation:
81	```python
82	def kalshi_taker_fee(price, contracts=1):
83	import math
84	raw_fee = 0.07 * contracts * price * (1 - price)
85	return math.ceil(raw_fee * 100) / 100
86	```
87	Replace the flat constant everywhere edge is calculated.
88
89	---
90
91	### FIX 2: Reduce Scan Interval from 30 Minutes to 5 Minutes
92	Current: Scanner runs every 30 minutes.
93	Finding: The competing bot (suislanchez, $1,325+ profit) scans every 5 minutes.
94	Cost: Zero. 1,440 API calls/day vs 10,000 limit.
95	Benefit: Catches mispricing faster, especially after GFS ensemble releases (data available ~3.5h after initialization at 00Z/06Z/12Z/18Z).
96	Risk: More Sonnet veto calls on Max plan. Mitigated by the filter pipeline - most markets get rejected before reaching Sonnet.
97
98	---
99
100	### FIX 3: Add Maker Orders for Better Fees
101	Current: All orders are taker (market) orders.
102	Finding: Maker fee is 25% of taker fee: `ceil(0.0175 * C * P * (1-P))`. At $0.50 contract: taker fee = $0.02, maker fee = $0.01.
103	Implementation: For trades where market is not about to close (>6h to settlement), place limit orders slightly inside the spread instead of taking the ask. Fall back to taker if not filled within 10 minutes.
104	Complexity: Medium - requires order monitoring and cancellation logic.
105	Priority: After fix 1 and 2 are validated.
106
107	---
108
109	### FIX 4: Extremized Aggregation (Replace Simple Calibration Multiply)
110	Current: `adj_prob = ens_prob * calibration_multiplier`
111	Better: Combine ensemble + NWS + base rates via log-odds with extremizing factor.
112	Research: Satopaa et al. (2014), Neyman & Roughgarden (2021) - optimal factor ~1.73 for robust aggregation.
113	Implementation:
114	```python
115	def extremize_aggregate(probabilities, weights=None, factor=1.5):
116	import math
117	if weights is None:
118	weights = [1.0 / len(probabilities)] * len(probabilities)
119	clamped = [max(0.001, min(0.999, p)) for p in probabilities]
120	log_odds = [math.log(p / (1 - p)) for p in clamped]
121	avg_lo = sum(w * lo for w, lo in zip(weights, log_odds))
122	ext_lo = avg_lo * factor
123	return 1 / (1 + math.exp(-ext_lo))
124	```
125	Notes:
126	- Start with factor 1.5 (conservative - ensemble members share model physics, high info overlap)
127	- Weights: 0.5 ensemble, 0.3 NWS, 0.2 historical base rate
128	- Factor for weather should be lower than geopolitical (1.73) because ensemble members aren't independent
129	Priority: After 30 trades validate the current system works.
130
131	---
132
133	### FIX 5: Rain Ensemble Bias Correction
134	Finding: GFS ensemble over-forecasts light precipitation (false alarm rate too high). Raw member counting overestimates "any rain" probability.
135	Source: Zhu & Luo (2015), "Precipitation Calibration Based on the Frequency-Matching Method"
136	Implementation: Maintain rolling 30-day comparison of ensemble rain probability vs observed rain for each city. Apply frequency-matching correction.
137	Priority: After accumulating 20+ KXRAIN settlements to establish baseline bias.
138
139	---
140
141	### FIX 6: City-Specific Low Temp Adjustments
142	Finding: Overnight lows are harder to forecast than highs due to radiative cooling, inversions, UHI effects.
143	Risk ranking:
144	\| City \| Low Temp Risk \| Reason \|
145	\|------\|-------------\|--------\|
146	\| Denver \| HIGHEST \| Altitude + inversions + dry air + DEN airport 24mi from downtown \|
147	\| Chicago \| HIGH \| Lake effect + continental + inversion potential \|
148	\| NYC \| MEDIUM \| UHI 8F+, airport vs Manhattan can differ 5F+ at night \|
149	\| LA \| MEDIUM \| LAX coastal vs inland can differ 10-15F on summer nights \|
150	\| Miami \| LOWEST \| Tropical maritime limits radiative cooling \|
151
152	Implementation: Apply higher minimum edge for KXLOWT than KXHIGH. Possible: 12% for low vs 10% for high, with Denver KXLOWT at 15%.
153	Priority: Can implement now as a constant, tune after data.
154
155	---
156
157	### FIX 7: Reduce Edge Threshold from 10% to 8%
158	Finding: The suislanchez bot uses 8% edge threshold and is profitable. With the variable fee fix (Fix 1), our true edge calculation becomes more accurate, so a lower threshold is justified.
159	Current: `MIN_EDGE = 0.10`
160	Proposed: `MIN_EDGE = 0.08` (matches competitor)
161	Caveat: Only after Fix 1 (variable fees) is implemented. With the flat $0.05 fee, 8% would let in bad trades.
162	Priority: Implement together with Fix 1.
163
164	---
165
166	### FIX 8: GFS Ensemble Release-Aware Scanning
167	Finding: GFS ensemble data becomes available ~3.5h after initialization:
168	\| Run \| Init (UTC) \| Data Available \| CDT \|
169	\|-----\|-----------\|----------------\|-----\|
170	\| 00Z \| 00:00 \| ~03:30 UTC \| 10:30 PM \|
171	\| 06Z \| 06:00 \| ~09:30 UTC \| 4:30 AM \|
172	\| 12Z \| 12:00 \| ~15:30 UTC \| 10:30 AM \|
173	\| 18Z \| 18:00 \| ~21:30 UTC \| 4:30 PM \|
174
175	Implementation: After switching to 5-minute scans, no special timing needed - the bot naturally picks up new data. But could log which GFS run the ensemble came from for calibration purposes.
176	Priority: Low - 5-minute scans handle this implicitly.
177
178	---
179
180	## Research Findings (Reference - No Code Changes Needed)
181
182	### FINDING 1: Kalshi Balance Earns 3.75-4% APY
183	Kalshi pays yield on total account balance. At $60 this is negligible ($2.25/year), but at $500+ it becomes a consideration - idle cash isn't fully idle.
184
185	### FINDING 2: Maker Fee History - Rounding Exploit Was Real
186	Before July 2025, maker fees were flat $0.0025/contract. On $0.02 contracts, this rounded to $0.01 - a 50% effective fee. Kalshi fixed this. Current variable formula eliminates the rounding issue.
187
188	### FINDING 3: Post-2024 Kalshi Regime Change
189	Before 2024 Q4, takers made money on average. After Kalshi's legal victory and volume explosion ($30M to $820M/quarter), professional market makers entered. Takers now lose on average. Our edge MUST come from better information (ensemble data), not from market structure.
190
191	### FINDING 4: Weather is in the "Other" Category at ~10% of Volume
192	Weather/climate is Kalshi's original niche but only ~10% of total notional volume. Lower volume = potentially wider spreads but also less competition from sophisticated market makers.
193
194	### FINDING 5: Longshot Bias Confirmed with Kalshi Data
195	72.1M trade analysis confirms: contracts below 10% implied probability consistently underperform for buyers. Our $0.40 price floor already exploits this by forcing us to trade in the 40-99 cent range where mispricing exists without the longshot trap.
196
197	### FINDING 6: Becker Dataset Available for Backtesting
198	Full Parquet dataset at github.com/Jon-Becker/prediction-market-analysis. Could filter to weather tickers and compute actual historical mispricing, time-of-day effects, and pre/post ensemble release patterns.
199
200	### FINDING 7: NBM May Be Superior to Raw GFS Ensemble
201	The National Blend of Models (NBM v4.3) already applies bias correction + quantile mapping to GFS/GEFS/HRRR/ECMWF. Could be used as a third probability source alongside ensemble and NWS for extremized aggregation.
202
203	### FINDING 8: Fan & van den Dool (2011) Key Results
204	- GFS ensemble 2m temp errors are dominated by large-scale spatial patterns
205	- 30-day mean forecast errors produce more robust bias corrections than 7-day means
206	- Cold season shows more removable bias than warm season
207	- ~60% of total error variance captured by leading EOF modes
208
209	### FINDING 9: UHI Effect Is Larger at Night
210	Urban Heat Island effect is 2-5F warmer at night (more than daytime 1-7F). Since Kalshi settles on airport METAR stations (often outside UHI), the model grid cell may include urban warming the airport doesn't see. This creates a systematic warm bias in low temp ensemble forecasts for urban stations.
211
212	### FINDING 10: GEFS Reforecast Data Available on AWS
213	`s3://noaa-gefs-retrospective/GEFSv12/reforecast/` - could build city-specific MAE/bias tables from 2000-present instead of waiting for live trade data. 11 ensemble members (vs 31 operational) but large historical sample.
214
215	---
216
217	## Open-Source Bot Comparison
218
219	\| Aspect \| suislanchez Bot \| Hermes v4 \|
220	\|--------\|----------------\|-----------\|
221	\| Profit \| $1,325+ confirmed \| $0 (just deployed) \|
222	\| Edge threshold \| 8% \| 10% (should lower to 8%) \|
223	\| Scan interval \| 5 minutes \| 30 minutes (should lower to 5) \|
224	\| Kelly fraction \| 15% (0.15x) \| Variable (0.125-0.375x by confidence) \|
225	\| Markets \| KXHIGH only \| KXHIGH + KXLOWT + KXRAIN \|
226	\| Fee model \| Unknown \| Variable formula (pending fix) \|
227	\| NWS cross-check \| No \| Yes (high, low, rain) \|
228	\| Sonnet veto gate \| No \| Yes \|
229	\| Correlation guard \| No \| Yes (2 per city/date) \|
230	\| Calibration \| Brier only \| Per-type per-city per-season \|
231
232	---
233
234	## Implementation Order
235
236	\| Phase \| Fixes \| When \|
237	\|-------\|-------\|------\|
238	\| Now \| Fix 1 (variable fees) + Fix 7 (lower threshold to 8%) \| Immediate \|
239	\| This week \| Fix 2 (5-min scans) + Fix 6 (city-specific low temp adjustments) \| After Fix 1 validated \|
240	\| After 30 trades \| Fix 4 (extremized aggregation) + Fix 5 (rain bias correction) \| Need data first \|
241	\| After 50 trades \| Fix 3 (maker orders) + Fix 8 (release-aware logging) \| Optimization phase \|