Zion Boggan zionboggan.com ↗

architecture notes and the alert-to-case playbook

a9d0d3a   Zion Boggan committed on Apr 4, 2026 (2 months ago)
docs/architecture.md +55 -0
@@ -0,0 +1,55 @@
+# Architecture
+
+## Data flow
+
+1. Agents on Windows and Linux endpoints ship logs and Sysmon events to the Wazuh
+ manager over an encrypted channel on 1514/tcp.
+2. The manager decodes events and runs them against the bundled ruleset plus the
+ custom rules in `config/wazuh/rules/local_rules.xml`.
+3. Any alert at level 10 or higher matches the `<integration>` block and the
+ manager invokes `integrations/custom-thehive`, which forwards a normalized
+ payload to the Shuffle webhook.
+4. Shuffle resolves the indicator of interest (source or destination IP), pulls
+ reputation from VirusTotal and OTX, and computes a verdict score.
+5. A TheHive case is opened with the score-derived severity, the indicator is
+ attached as an observable flagged as an IOC, and the analyst channel gets a
+ message linking the case.
+6. Active response is wired for rule 100210 (traffic to a CTI-flagged IP), which
+ triggers a local `firewall-drop` with a 10 minute timeout while the analyst
+ confirms.
+
+## Why these three tools
+
+- **Wazuh** gives agent-based collection, FIM, vulnerability detection, and a
+ rule language flexible enough to express the detections I cared about without
+ bolting on a separate log shipper.
+- **Shuffle** is the glue. Keeping the orchestration logic out of Wazuh means the
+ enrichment and case-creation logic is versioned and testable on its own, and
+ the SIEM only has to know one webhook URL.
+- **TheHive** is where the analyst actually works. Cases arrive with the verdict
+ and the observable already attached, so triage starts from evidence.
+
+## Severity mapping
+
+The scoring step in the workflow turns reputation counts into a TheHive severity:
+
+| Score | TheHive severity | Trigger |
+|-------|------------------|---------|
+| 0-1 | Low (1) | nothing notable on the indicator |
+| 2-5 | Medium (2) | OTX pulses or a couple of VT hits |
+| 6+ | High (3) | multiple VT detections plus OTX pulses |
+
+`score = vt_malicious * 2 + otx_pulse_count`.
+
+## Threat coverage
+
+The custom rules map to MITRE ATT&CK so the cases are tagged with technique IDs:
+
+| Rule | Technique |
+|------|-----------|
+| 100101 process from user-writable path | T1059 |
+| 100102 Office spawns script host | T1566 / T1059.001 |
+| 100110 LSASS access | T1003.001 |
+| 100120 / 100121 service + scheduled task | T1543.003 / T1053.005 |
+| 100200 / 100201 SSH + RDP brute force | T1110 |
+| 100210-100212 CTI list hits | T1071 / T1204 |
docs/playbook.md +60 -0
@@ -0,0 +1,60 @@
+# Playbook walkthrough
+
+This is the path a single alert takes, end to end, using rule 100210 (outbound
+connection to a CTI-flagged IP) as the example.
+
+## 1. Detection
+
+An endpoint makes a connection to `45.137.21.9`, which is present in
+`cti-malicious-ip`. Wazuh rule 100210 fires at level 12 and the alert is written
+to `alerts.json`.
+
+## 2. Handoff
+
+The level is above 10, so the manager runs `custom-thehive` with the alert file.
+The integration extracts the fields that matter - rule id, level, description,
+MITRE ids, agent identity, source and destination IPs, and the full log - and
+POSTs them to the Shuffle webhook as JSON.
+
+## 3. Enrichment
+
+Shuffle's router picks the destination IP. VirusTotal and OTX are queried in
+parallel. The scoring step combines the results:
+
+```
+vt_malicious = 8
+otx_pulses = 3
+score = 8 * 2 + 3 = 19 -> severity High
+```
+
+## 4. Case creation
+
+A TheHive case is opened:
+
+- Title: `[Wazuh] Outbound connection to CTI-flagged IP: 45.137.21.9`
+- Severity: High
+- Tags: `wazuh`, `automated`, `T1071`
+- Description carries the agent, the rule, the reputation counts, and the raw log
+
+The destination IP is attached as an observable, marked as an IOC, so it flows
+into TheHive's observable history and can be swept against other cases.
+
+## 5. Containment
+
+Rule 100210 is also wired to active response, so the manager issues a
+`firewall-drop` on the endpoint for 600 seconds. That buys time without making the
+block permanent - the analyst decides whether to extend it.
+
+## 6. Notification
+
+Slack gets a one-line summary with the severity, the indicator, the reputation
+counts, and the case id. The analyst opens the case already knowing what it is.
+
+## Tuning notes
+
+- The level-10 threshold on the integration is deliberate. Pushing everything to
+ TheHive buries analysts; the brute-force and CTI rules are the ones worth a case.
+- The active-response block is scoped to a single rule on purpose. Auto-blocking on
+ a noisier rule would be a great way to firewall yourself out of your own hosts.
+- If VirusTotal rate-limits (the free tier is 4 req/min), the scoring step treats a
+ missing result as zero rather than failing the case creation.
docs/screenshots/README.md +30 -0
@@ -0,0 +1,30 @@
+# Screenshots
+
+These come from the running stack. Bring it up with `./scripts/deploy.sh`, enroll an
+agent, and fire a test alert, then capture the shots below. Drop the files in this
+directory with the names listed and they'll render in the main README.
+
+Capture at a consistent width (1280-1440), and annotate the call-out in each shot
+(a red circle/arrow is enough).
+
+| File | Where | Annotate |
+|------|-------|----------|
+| `01-wazuh-alerts.png` | Wazuh dashboard → Security events, filtered to rule level ≥ 10 | the CTI / brute-force rule that fired, and its MITRE technique tag |
+| `02-shuffle-workflow.png` | Shuffle → the imported `Wazuh -> TheHive Enrichment` workflow canvas | the enrichment → scoring → case-creation path |
+| `03-shuffle-run.png` | Shuffle → a finished run of that workflow | the VirusTotal/OTX result feeding the severity score |
+| `04-thehive-case.png` | TheHive → the auto-created case | the severity, the MITRE tags, and the attached IOC observable |
+| `05-agent-enrolled.png` | Wazuh dashboard → Agents | the enrolled endpoint reporting in |
+
+## Triggering a test alert
+
+The quickest way to get a case end-to-end without waiting for real activity:
+
+```bash
+# from an enrolled endpoint, make a DNS lookup / connection to a value
+# you've placed in one of the CTI lists, e.g. cdn-jquery-min.net
+# or replay a sample brute-force against SSH to trip rule 100200
+```
+
+That fires a level-12 rule, which hits the integration, which runs the Shuffle
+workflow, which opens the TheHive case - giving you shots 01 through 04 from a single
+event.