45c66c9 · CTI Detection Automation

readme, architecture + approval-flow docs, screenshots and walkthrough

45c66c9 Zion Boggan committed on May 28, 2026 (3 weeks ago)

README.md +115 -0

		@@ -0,0 +1,115 @@
	+	# CTI Detection Automation
	+
	+	Pulls indicators from live threat-intel feeds, deduplicates them, extracts the
	+	MITRE ATT&CK techniques behind them, turns the result into Wazuh detection rules
	+	and CDB lists, and emails an analyst for sign-off before anything goes live. No
	+	rule reaches the SIEM without a human approving it.
	+
	+	This is the piece that feeds the watchlists used by the
	+	[SOC automation lab](../soc-automation-lab) - the `cti-malicious-ip`,
	+	`cti-malicious-domain`, and `cti-malware-hash` lists that the Wazuh rules look up
	+	against are generated here.
	+
	+	![Approval email](docs/screenshots/01-approval-email.png)
	+
	+	## What it does
	+
	+	- Collects from ThreatFox, Feodo Tracker, URLhaus, AlienVault OTX, and OpenPhish,
	+	covering IPs, domains, URLs, file hashes, phishing, exploit-kit and malware-download
	+	URLs, C2 infrastructure, and (optionally) leaked credentials.
	+	- Normalizes every source into one indicator model and deduplicates across
	+	feeds - an IP seen in both ThreatFox and Feodo becomes one indicator carrying both
	+	sources, the higher confidence, and the union of techniques.
	+	- Extracts and dedups TTPs: ATT&CK technique IDs come straight from OTX pulses and
	+	are inferred from malware family and threat type for the feeds that don't carry them,
	+	then collapsed into a ranked coverage list.
	+	- Generates Wazuh rules: CDB lists per indicator type plus an XML ruleset whose
	+	list-lookup rules are tagged with the dominant ATT&CK techniques for that bucket.
	+	- Gates on approval: every run produces a candidate bundle and emails the analyst a
	+	signed, time-limited review link. Rules are written to Wazuh only when the analyst
	+	approves.
	+
	+	```mermaid
	+	flowchart LR
	+	F[ThreatFox / Feodo / URLhaus<br/>OTX / OpenPhish] --> N[normalize]
	+	N --> D[deduplicate]
	+	D --> C{confidence<br/>>= threshold}
	+	C --> T[extract + dedup TTPs]
	+	T --> G[generate CDB lists<br/>+ Wazuh rules]
	+	G --> M[email analyst<br/>signed review link]
	+	M -->\|approve\| W[promote to Wazuh]
	+	M -->\|reject\| X[discard]
	+	```
	+
	+	## The approval gate
	+
	+	The pipeline never deploys on its own. It writes the candidate to a staging
	+	directory and emails a review link. The analyst sees the diff against the last
	+	approved bundle, the indicator counts, and the ATT&CK coverage, then approves or
	+	rejects.
	+
	+	\| Review page \| Bundle history \|
	+	\|---\|---\|
	+	\| ![Review](docs/screenshots/02-review-page.png) \| ![Dashboard](docs/screenshots/03-dashboard.png) \|
	+
	+	Short walkthrough of the flow: [docs/media/cti-approval-walkthrough.mp4](docs/media/cti-approval-walkthrough.mp4)
	+
	+	The review link is an `itsdangerous` signed token with a TTL, so it can't be
	+	forged or replayed after it expires.
	+
	+	## Run the demo
	+
	+	No API keys needed - the demo runs against the bundled fixtures:
	+
	+	```bash
	+	make install
	+	make demo # generates a bundle and writes the approval email to output/emails/
	+	make serve # approval console on http://localhost:8080
	+	```
	+
	+	Open the email under `output/emails/`, click through to the review page, and
	+	approve. The promoted lists and rules land in `output/active/`.
	+
	+	## Run it for real
	+
	+	```bash
	+	cp .env.example .env # set CTI_APPROVAL_SECRET and feed API keys
	+	cp config.example.yaml config.yaml
	+	python -m cti.cli run -c config.yaml
	+	```
	+
	+	ThreatFox and OTX need free API keys (`THREATFOX_AUTH_KEY`, `OTX_API_KEY`); Feodo,
	+	URLhaus, and OpenPhish are keyless. Set `email.backend: smtp` and the SMTP settings
	+	in `config.yaml` to send the approval mail to a real inbox. Point `wazuh_etc_dir` at
	+	the manager's `/var/ossec/etc` and approval will write the lists and rules straight
	+	into place.
	+
	+	Schedule it with the systemd timer in `deploy/` (hourly by default) or any cron.
	+
	+	## Layout
	+
	+	```
	+	src/cti/feeds/ one connector per source, each with a pure parse()
	+	src/cti/dedup.py cross-feed merge
	+	src/cti/ttp.py ATT&CK extraction + coverage report
	+	src/cti/rules.py CDB list + Wazuh XML generation
	+	src/cti/approval.py signed tokens + email rendering + SMTP
	+	src/cti/pipeline.py orchestration, candidate staging, promotion
	+	src/cti/web.py Flask approval console
	+	fixtures/ sample feed payloads for the demo and tests
	+	tests/ pytest suite (feeds, dedup, ttp, rules, approval, web)
	+	```
	+
	+	## Tests
	+
	+	```bash
	+	make test
	+	```
	+
	+	Connectors are split into `fetch` and `parse`, so the suite runs the real parsing
	+	logic against fixtures with no network. Coverage includes cross-feed dedup, TTP
	+	extraction, rule generation well-formedness, token signing/expiry, and the full
	+	approve/reject path through the web app.
	+
	+	See [docs/architecture.md](docs/architecture.md) for the data model and
	+	[docs/approval-flow.md](docs/approval-flow.md) for the gate in detail.

docs/approval-flow.md +64 -0

		@@ -0,0 +1,64 @@
	+	# Approval flow
	+
	+	The gate is the point of the project, so it gets its own document.
	+
	+	## 1. Bundle generated
	+
	+	A pipeline run produces a candidate bundle and stages it under
	+	`output/candidates/<bundle-id>/`:
	+
	+	```
	+	lists/cti-malicious-ip
	+	lists/cti-malicious-domain
	+	lists/cti-malicious-url
	+	lists/cti-malware-hash
	+	lists/cti-leaked-email
	+	local_cti_rules.xml
	+	ttp_coverage.md
	+	manifest.json
	+	keys.json
	+	```
	+
	+	`manifest.json` carries the counts, the technique list, and the diff against the
	+	last approved bundle (added / removed / unchanged).
	+
	+	## 2. Analyst emailed
	+
	+	The email summarizes the bundle - new vs. aged-out indicators, counts by type, and
	+	the extracted ATT&CK techniques - and links to the review page. The link embeds an
	+	`itsdangerous` signed token:
	+
	+	```
	+	URLSafeTimedSerializer(secret, salt="cti-rule-approval").dumps({"bundle_id": ...})
	+	```
	+
	+	The token is signed with `CTI_APPROVAL_SECRET` and carries a TTL (`token_ttl`,
	+	default 24h). It can't be forged without the secret and stops working once it
	+	expires, so a stale email can't approve an old bundle.
	+
	+	The backend is pluggable: `file` (writes the rendered email to disk, used by the
	+	demo), `console`, or `smtp` for a real inbox.
	+
	+	## 3. Review
	+
	+	The review page renders the manifest - the diff cards, indicators by type, the
	+	generated CDB lists and their sizes, and the full ATT&CK coverage table. The analyst
	+	has the whole picture without leaving the page.
	+
	+	## 4. Decision
	+
	+	- Approve → `promote` copies the candidate into `output/active/`, updates the
	+	baseline used for the next diff, and (when `wazuh_etc_dir` is set) writes the lists
	+	and rules into the Wazuh manager. A reload picks them up.
	+	- Reject → a `REJECTED` marker with the optional reason is written to the
	+	candidate directory and nothing is deployed.
	+
	+	Either way the decision is recorded and the bundle's status shows on the dashboard.
	+
	+	## Token, not session
	+
	+	There's deliberately no login. The signed link is the authorization - it's what was
	+	emailed to the analyst, it's scoped to one bundle, and it expires. That keeps the
	+	service stateless and means it can sit behind a reverse proxy on the SOC network
	+	without its own user store. For a multi-analyst setup you'd put SSO in front of it;
	+	the token model doesn't change.

docs/architecture.md +79 -0

		@@ -0,0 +1,79 @@
	+	# Architecture
	+
	+	## Indicator model
	+
	+	Everything from every feed is normalized into one `Indicator`:
	+
	+	```
	+	type ip \| domain \| url \| sha256 \| md5 \| sha1 \| email
	+	value the indicator itself
	+	source comma-joined list of feeds that reported it
	+	threat_type botnet_cc \| phishing \| payload_delivery \| ...
	+	confidence 0-100
	+	malware family name when known
	+	techniques list of ATT&CK technique IDs
	+	tags free-form labels from the source
	+	```
	+
	+	A feed connector does two things: `fetch_raw()` talks to the API, and `parse()`
	+	turns the raw payload into `Indicator` objects. They're separate so the parsing
	+	logic can be tested against fixtures without a network, and so adding a feed is
	+	just one new `parse()`.
	+
	+	## Deduplication
	+
	+	Feeds overlap heavily - the same C2 IP shows up in ThreatFox and Feodo, the same
	+	domain in OTX and URLhaus. Dedup keys on `(type, lowercased value)` and merges:
	+
	+	- confidence becomes the max across sources
	+	- techniques and tags become the union
	+	- sources are concatenated, so provenance is preserved
	+	- malware family and first-seen are taken from the first source that had them
	+
	+	The result is one record per indicator that's stronger than any single feed's view
	+	of it.
	+
	+	## TTP extraction
	+
	+	Techniques come from three places, in order of trust:
	+
	+	1. Directly from the feed - OTX pulses carry `attack_ids`, which are used as-is.
	+	2. From the malware family - a CobaltStrike or AgentTesla indicator maps to that
	+	family's common techniques (`src/cti/mitre.py`).
	+	3. From the threat type - a phishing URL maps to T1566.002 / T1204.001 even when
	+	no family is named.
	+
	+	Extraction then collapses these across all indicators into a ranked
	+	`Technique` list with per-technique indicator counts and contributing sources, which
	+	becomes the coverage report attached to every bundle.
	+
	+	## Rule generation
	+
	+	Two artifacts come out of a bundle:
	+
	+	CDB lists - one file per indicator type (`cti-malicious-ip`,
	+	`cti-malicious-domain`, `cti-malicious-url`, `cti-malware-hash`, `cti-leaked-email`),
	+	in Wazuh's `key:value` format where the value is the malware family or threat type.
	+	These are the actual lookup tables.
	+
	+	XML rules - list-lookup rules (one per type present) that fire when a field
	+	matches an entry in the corresponding CDB list. Each rule is tagged with the
	+	techniques that dominate that bucket, so an alert arrives already mapped to ATT&CK.
	+	Rule IDs start at a configurable base (default 100300) to stay clear of the bundled
	+	Wazuh ruleset and the lab's hand-written rules.
	+
	+	## Staging and promotion
	+
	+	`run` writes a candidate under `output/candidates/<bundle-id>/` with the lists, the
	+	rules, the coverage report, and a manifest carrying the diff against the last
	+	approved bundle. Nothing is active yet. `promote` (called when the analyst approves)
	+	copies the candidate into `output/active/`, records the indicator set as the new
	+	baseline for the next diff, and - if `wazuh_etc_dir` is configured - writes the lists
	+	and rules into the Wazuh manager's directories.
	+
	+	## Why a human in the loop
	+
	+	Auto-generated detections from public feeds are a false-positive risk: a shared
	+	CDN IP, a sinkholed domain, a hash collision in a list. The approval gate means a
	+	bad bundle is a rejected email, not a pager storm. The cost is a few minutes of
	+	analyst time per run, which is cheap next to chasing phantom alerts across the fleet.

docs/media/cti-approval-walkthrough.mp4 +0 -0

Binary file not shown

docs/media/cti-approval-walkthrough.webm +0 -0

Binary file not shown

docs/screenshots/01-approval-email.png +0 -0

Binary file not shown

docs/screenshots/02-review-page.png +0 -0

Binary file not shown

docs/screenshots/03-dashboard.png +0 -0

Binary file not shown