Zion Boggan zionboggan.com ↗

readme, architecture + approval-flow docs, screenshots and walkthrough

45c66c9   Zion Boggan committed on May 28, 2026 (3 weeks ago)
README.md +115 -0
@@ -0,0 +1,115 @@
+# CTI Detection Automation
+
+Pulls indicators from live threat-intel feeds, deduplicates them, extracts the
+MITRE ATT&CK techniques behind them, turns the result into Wazuh detection rules
+and CDB lists, and emails an analyst for sign-off before anything goes live. No
+rule reaches the SIEM without a human approving it.
+
+This is the piece that feeds the watchlists used by the
+[SOC automation lab](../soc-automation-lab) - the `cti-malicious-ip`,
+`cti-malicious-domain`, and `cti-malware-hash` lists that the Wazuh rules look up
+against are generated here.
+
+![Approval email](docs/screenshots/01-approval-email.png)
+
+## What it does
+
+- **Collects** from ThreatFox, Feodo Tracker, URLhaus, AlienVault OTX, and OpenPhish,
+ covering IPs, domains, URLs, file hashes, phishing, exploit-kit and malware-download
+ URLs, C2 infrastructure, and (optionally) leaked credentials.
+- **Normalizes** every source into one indicator model and **deduplicates** across
+ feeds - an IP seen in both ThreatFox and Feodo becomes one indicator carrying both
+ sources, the higher confidence, and the union of techniques.
+- **Extracts and dedups TTPs**: ATT&CK technique IDs come straight from OTX pulses and
+ are inferred from malware family and threat type for the feeds that don't carry them,
+ then collapsed into a ranked coverage list.
+- **Generates Wazuh rules**: CDB lists per indicator type plus an XML ruleset whose
+ list-lookup rules are tagged with the dominant ATT&CK techniques for that bucket.
+- **Gates on approval**: every run produces a candidate bundle and emails the analyst a
+ signed, time-limited review link. Rules are written to Wazuh only when the analyst
+ approves.
+
+```mermaid
+flowchart LR
+ F[ThreatFox / Feodo / URLhaus<br/>OTX / OpenPhish] --> N[normalize]
+ N --> D[deduplicate]
+ D --> C{confidence<br/>>= threshold}
+ C --> T[extract + dedup TTPs]
+ T --> G[generate CDB lists<br/>+ Wazuh rules]
+ G --> M[email analyst<br/>signed review link]
+ M -->|approve| W[promote to Wazuh]
+ M -->|reject| X[discard]
+```
+
+## The approval gate
+
+The pipeline never deploys on its own. It writes the candidate to a staging
+directory and emails a review link. The analyst sees the diff against the last
+approved bundle, the indicator counts, and the ATT&CK coverage, then approves or
+rejects.
+
+| Review page | Bundle history |
+|---|---|
+| ![Review](docs/screenshots/02-review-page.png) | ![Dashboard](docs/screenshots/03-dashboard.png) |
+
+Short walkthrough of the flow: [docs/media/cti-approval-walkthrough.mp4](docs/media/cti-approval-walkthrough.mp4)
+
+The review link is an `itsdangerous` signed token with a TTL, so it can't be
+forged or replayed after it expires.
+
+## Run the demo
+
+No API keys needed - the demo runs against the bundled fixtures:
+
+```bash
+make install
+make demo # generates a bundle and writes the approval email to output/emails/
+make serve # approval console on http://localhost:8080
+```
+
+Open the email under `output/emails/`, click through to the review page, and
+approve. The promoted lists and rules land in `output/active/`.
+
+## Run it for real
+
+```bash
+cp .env.example .env # set CTI_APPROVAL_SECRET and feed API keys
+cp config.example.yaml config.yaml
+python -m cti.cli run -c config.yaml
+```
+
+ThreatFox and OTX need free API keys (`THREATFOX_AUTH_KEY`, `OTX_API_KEY`); Feodo,
+URLhaus, and OpenPhish are keyless. Set `email.backend: smtp` and the SMTP settings
+in `config.yaml` to send the approval mail to a real inbox. Point `wazuh_etc_dir` at
+the manager's `/var/ossec/etc` and approval will write the lists and rules straight
+into place.
+
+Schedule it with the systemd timer in `deploy/` (hourly by default) or any cron.
+
+## Layout
+
+```
+src/cti/feeds/ one connector per source, each with a pure parse()
+src/cti/dedup.py cross-feed merge
+src/cti/ttp.py ATT&CK extraction + coverage report
+src/cti/rules.py CDB list + Wazuh XML generation
+src/cti/approval.py signed tokens + email rendering + SMTP
+src/cti/pipeline.py orchestration, candidate staging, promotion
+src/cti/web.py Flask approval console
+fixtures/ sample feed payloads for the demo and tests
+tests/ pytest suite (feeds, dedup, ttp, rules, approval, web)
+```
+
+## Tests
+
+```bash
+make test
+```
+
+Connectors are split into `fetch` and `parse`, so the suite runs the real parsing
+logic against fixtures with no network. Coverage includes cross-feed dedup, TTP
+extraction, rule generation well-formedness, token signing/expiry, and the full
+approve/reject path through the web app.
+
+See [docs/architecture.md](docs/architecture.md) for the data model and
+[docs/approval-flow.md](docs/approval-flow.md) for the gate in detail.
docs/approval-flow.md +64 -0
@@ -0,0 +1,64 @@
+# Approval flow
+
+The gate is the point of the project, so it gets its own document.
+
+## 1. Bundle generated
+
+A pipeline run produces a candidate bundle and stages it under
+`output/candidates/<bundle-id>/`:
+
+```
+lists/cti-malicious-ip
+lists/cti-malicious-domain
+lists/cti-malicious-url
+lists/cti-malware-hash
+lists/cti-leaked-email
+local_cti_rules.xml
+ttp_coverage.md
+manifest.json
+keys.json
+```
+
+`manifest.json` carries the counts, the technique list, and the diff against the
+last approved bundle (added / removed / unchanged).
+
+## 2. Analyst emailed
+
+The email summarizes the bundle - new vs. aged-out indicators, counts by type, and
+the extracted ATT&CK techniques - and links to the review page. The link embeds an
+`itsdangerous` signed token:
+
+```
+URLSafeTimedSerializer(secret, salt="cti-rule-approval").dumps({"bundle_id": ...})
+```
+
+The token is signed with `CTI_APPROVAL_SECRET` and carries a TTL (`token_ttl`,
+default 24h). It can't be forged without the secret and stops working once it
+expires, so a stale email can't approve an old bundle.
+
+The backend is pluggable: `file` (writes the rendered email to disk, used by the
+demo), `console`, or `smtp` for a real inbox.
+
+## 3. Review
+
+The review page renders the manifest - the diff cards, indicators by type, the
+generated CDB lists and their sizes, and the full ATT&CK coverage table. The analyst
+has the whole picture without leaving the page.
+
+## 4. Decision
+
+- **Approve**`promote` copies the candidate into `output/active/`, updates the
+ baseline used for the next diff, and (when `wazuh_etc_dir` is set) writes the lists
+ and rules into the Wazuh manager. A reload picks them up.
+- **Reject** → a `REJECTED` marker with the optional reason is written to the
+ candidate directory and nothing is deployed.
+
+Either way the decision is recorded and the bundle's status shows on the dashboard.
+
+## Token, not session
+
+There's deliberately no login. The signed link is the authorization - it's what was
+emailed to the analyst, it's scoped to one bundle, and it expires. That keeps the
+service stateless and means it can sit behind a reverse proxy on the SOC network
+without its own user store. For a multi-analyst setup you'd put SSO in front of it;
+the token model doesn't change.
docs/architecture.md +79 -0
@@ -0,0 +1,79 @@
+# Architecture
+
+## Indicator model
+
+Everything from every feed is normalized into one `Indicator`:
+
+```
+type ip | domain | url | sha256 | md5 | sha1 | email
+value the indicator itself
+source comma-joined list of feeds that reported it
+threat_type botnet_cc | phishing | payload_delivery | ...
+confidence 0-100
+malware family name when known
+techniques list of ATT&CK technique IDs
+tags free-form labels from the source
+```
+
+A feed connector does two things: `fetch_raw()` talks to the API, and `parse()`
+turns the raw payload into `Indicator` objects. They're separate so the parsing
+logic can be tested against fixtures without a network, and so adding a feed is
+just one new `parse()`.
+
+## Deduplication
+
+Feeds overlap heavily - the same C2 IP shows up in ThreatFox and Feodo, the same
+domain in OTX and URLhaus. Dedup keys on `(type, lowercased value)` and merges:
+
+- confidence becomes the max across sources
+- techniques and tags become the union
+- sources are concatenated, so provenance is preserved
+- malware family and first-seen are taken from the first source that had them
+
+The result is one record per indicator that's stronger than any single feed's view
+of it.
+
+## TTP extraction
+
+Techniques come from three places, in order of trust:
+
+1. **Directly from the feed** - OTX pulses carry `attack_ids`, which are used as-is.
+2. **From the malware family** - a CobaltStrike or AgentTesla indicator maps to that
+ family's common techniques (`src/cti/mitre.py`).
+3. **From the threat type** - a phishing URL maps to T1566.002 / T1204.001 even when
+ no family is named.
+
+Extraction then collapses these across all indicators into a ranked
+`Technique` list with per-technique indicator counts and contributing sources, which
+becomes the coverage report attached to every bundle.
+
+## Rule generation
+
+Two artifacts come out of a bundle:
+
+**CDB lists** - one file per indicator type (`cti-malicious-ip`,
+`cti-malicious-domain`, `cti-malicious-url`, `cti-malware-hash`, `cti-leaked-email`),
+in Wazuh's `key:value` format where the value is the malware family or threat type.
+These are the actual lookup tables.
+
+**XML rules** - list-lookup rules (one per type present) that fire when a field
+matches an entry in the corresponding CDB list. Each rule is tagged with the
+techniques that dominate that bucket, so an alert arrives already mapped to ATT&CK.
+Rule IDs start at a configurable base (default 100300) to stay clear of the bundled
+Wazuh ruleset and the lab's hand-written rules.
+
+## Staging and promotion
+
+`run` writes a candidate under `output/candidates/<bundle-id>/` with the lists, the
+rules, the coverage report, and a manifest carrying the diff against the last
+approved bundle. Nothing is active yet. `promote` (called when the analyst approves)
+copies the candidate into `output/active/`, records the indicator set as the new
+baseline for the next diff, and - if `wazuh_etc_dir` is configured - writes the lists
+and rules into the Wazuh manager's directories.
+
+## Why a human in the loop
+
+Auto-generated detections from public feeds are a false-positive risk: a shared
+CDN IP, a sinkholed domain, a hash collision in a list. The approval gate means a
+bad bundle is a rejected email, not a pager storm. The cost is a few minutes of
+analyst time per run, which is cheap next to chasing phantom alerts across the fleet.
docs/media/cti-approval-walkthrough.mp4 +0 -0
Binary file not shown
docs/media/cti-approval-walkthrough.webm +0 -0
Binary file not shown
docs/screenshots/01-approval-email.png +0 -0
Binary file not shown
docs/screenshots/02-review-page.png +0 -0
Binary file not shown
docs/screenshots/03-dashboard.png +0 -0
Binary file not shown