a9d0d3a · SOC Automation Lab

architecture notes and the alert-to-case playbook

a9d0d3a Zion Boggan committed on Apr 4, 2026 (2 months ago)

docs/architecture.md +55 -0

		@@ -0,0 +1,55 @@
	+	# Architecture
	+
	+	## Data flow
	+
	+	1. Agents on Windows and Linux endpoints ship logs and Sysmon events to the Wazuh
	+	manager over an encrypted channel on 1514/tcp.
	+	2. The manager decodes events and runs them against the bundled ruleset plus the
	+	custom rules in `config/wazuh/rules/local_rules.xml`.
	+	3. Any alert at level 10 or higher matches the `<integration>` block and the
	+	manager invokes `integrations/custom-thehive`, which forwards a normalized
	+	payload to the Shuffle webhook.
	+	4. Shuffle resolves the indicator of interest (source or destination IP), pulls
	+	reputation from VirusTotal and OTX, and computes a verdict score.
	+	5. A TheHive case is opened with the score-derived severity, the indicator is
	+	attached as an observable flagged as an IOC, and the analyst channel gets a
	+	message linking the case.
	+	6. Active response is wired for rule 100210 (traffic to a CTI-flagged IP), which
	+	triggers a local `firewall-drop` with a 10 minute timeout while the analyst
	+	confirms.
	+
	+	## Why these three tools
	+
	+	- Wazuh gives agent-based collection, FIM, vulnerability detection, and a
	+	rule language flexible enough to express the detections I cared about without
	+	bolting on a separate log shipper.
	+	- Shuffle is the glue. Keeping the orchestration logic out of Wazuh means the
	+	enrichment and case-creation logic is versioned and testable on its own, and
	+	the SIEM only has to know one webhook URL.
	+	- TheHive is where the analyst actually works. Cases arrive with the verdict
	+	and the observable already attached, so triage starts from evidence.
	+
	+	## Severity mapping
	+
	+	The scoring step in the workflow turns reputation counts into a TheHive severity:
	+
	+	\| Score \| TheHive severity \| Trigger \|
	+	\|-------\|------------------\|---------\|
	+	\| 0-1 \| Low (1) \| nothing notable on the indicator \|
	+	\| 2-5 \| Medium (2) \| OTX pulses or a couple of VT hits \|
	+	\| 6+ \| High (3) \| multiple VT detections plus OTX pulses \|
	+
	+	`score = vt_malicious * 2 + otx_pulse_count`.
	+
	+	## Threat coverage
	+
	+	The custom rules map to MITRE ATT&CK so the cases are tagged with technique IDs:
	+
	+	\| Rule \| Technique \|
	+	\|------\|-----------\|
	+	\| 100101 process from user-writable path \| T1059 \|
	+	\| 100102 Office spawns script host \| T1566 / T1059.001 \|
	+	\| 100110 LSASS access \| T1003.001 \|
	+	\| 100120 / 100121 service + scheduled task \| T1543.003 / T1053.005 \|
	+	\| 100200 / 100201 SSH + RDP brute force \| T1110 \|
	+	\| 100210-100212 CTI list hits \| T1071 / T1204 \|

docs/playbook.md +60 -0

		@@ -0,0 +1,60 @@
	+	# Playbook walkthrough
	+
	+	This is the path a single alert takes, end to end, using rule 100210 (outbound
	+	connection to a CTI-flagged IP) as the example.
	+
	+	## 1. Detection
	+
	+	An endpoint makes a connection to `45.137.21.9`, which is present in
	+	`cti-malicious-ip`. Wazuh rule 100210 fires at level 12 and the alert is written
	+	to `alerts.json`.
	+
	+	## 2. Handoff
	+
	+	The level is above 10, so the manager runs `custom-thehive` with the alert file.
	+	The integration extracts the fields that matter - rule id, level, description,
	+	MITRE ids, agent identity, source and destination IPs, and the full log - and
	+	POSTs them to the Shuffle webhook as JSON.
	+
	+	## 3. Enrichment
	+
	+	Shuffle's router picks the destination IP. VirusTotal and OTX are queried in
	+	parallel. The scoring step combines the results:
	+
	+	```
	+	vt_malicious = 8
	+	otx_pulses = 3
	+	score = 8 * 2 + 3 = 19 -> severity High
	+	```
	+
	+	## 4. Case creation
	+
	+	A TheHive case is opened:
	+
	+	- Title: `[Wazuh] Outbound connection to CTI-flagged IP: 45.137.21.9`
	+	- Severity: High
	+	- Tags: `wazuh`, `automated`, `T1071`
	+	- Description carries the agent, the rule, the reputation counts, and the raw log
	+
	+	The destination IP is attached as an observable, marked as an IOC, so it flows
	+	into TheHive's observable history and can be swept against other cases.
	+
	+	## 5. Containment
	+
	+	Rule 100210 is also wired to active response, so the manager issues a
	+	`firewall-drop` on the endpoint for 600 seconds. That buys time without making the
	+	block permanent - the analyst decides whether to extend it.
	+
	+	## 6. Notification
	+
	+	Slack gets a one-line summary with the severity, the indicator, the reputation
	+	counts, and the case id. The analyst opens the case already knowing what it is.
	+
	+	## Tuning notes
	+
	+	- The level-10 threshold on the integration is deliberate. Pushing everything to
	+	TheHive buries analysts; the brute-force and CTI rules are the ones worth a case.
	+	- The active-response block is scoped to a single rule on purpose. Auto-blocking on
	+	a noisier rule would be a great way to firewall yourself out of your own hosts.
	+	- If VirusTotal rate-limits (the free tier is 4 req/min), the scoring step treats a
	+	missing result as zero rather than failing the case creation.

docs/screenshots/README.md +30 -0

		@@ -0,0 +1,30 @@
	+	# Screenshots
	+
	+	These come from the running stack. Bring it up with `./scripts/deploy.sh`, enroll an
	+	agent, and fire a test alert, then capture the shots below. Drop the files in this
	+	directory with the names listed and they'll render in the main README.
	+
	+	Capture at a consistent width (1280-1440), and annotate the call-out in each shot
	+	(a red circle/arrow is enough).
	+
	+	\| File \| Where \| Annotate \|
	+	\|------\|-------\|----------\|
	+	\| `01-wazuh-alerts.png` \| Wazuh dashboard → Security events, filtered to rule level ≥ 10 \| the CTI / brute-force rule that fired, and its MITRE technique tag \|
	+	\| `02-shuffle-workflow.png` \| Shuffle → the imported `Wazuh -> TheHive Enrichment` workflow canvas \| the enrichment → scoring → case-creation path \|
	+	\| `03-shuffle-run.png` \| Shuffle → a finished run of that workflow \| the VirusTotal/OTX result feeding the severity score \|
	+	\| `04-thehive-case.png` \| TheHive → the auto-created case \| the severity, the MITRE tags, and the attached IOC observable \|
	+	\| `05-agent-enrolled.png` \| Wazuh dashboard → Agents \| the enrolled endpoint reporting in \|
	+
	+	## Triggering a test alert
	+
	+	The quickest way to get a case end-to-end without waiting for real activity:
	+
	+	```bash
	+	# from an enrolled endpoint, make a DNS lookup / connection to a value
	+	# you've placed in one of the CTI lists, e.g. cdn-jquery-min.net
	+	# or replay a sample brute-force against SSH to trip rule 100200
	+	```
	+
	+	That fires a level-12 rule, which hits the integration, which runs the Shuffle
	+	workflow, which opens the TheHive case - giving you shots 01 through 04 from a single
	+	event.