| @@ -0,0 +1,115 @@ | ||
| + | # CTI Detection Automation | |
| + | ||
| + | Pulls indicators from live threat-intel feeds, deduplicates them, extracts the | |
| + | MITRE ATT&CK techniques behind them, turns the result into Wazuh detection rules | |
| + | and CDB lists, and emails an analyst for sign-off before anything goes live. No | |
| + | rule reaches the SIEM without a human approving it. | |
| + | ||
| + | This is the piece that feeds the watchlists used by the | |
| + | [SOC automation lab](../soc-automation-lab) - the `cti-malicious-ip`, | |
| + | `cti-malicious-domain`, and `cti-malware-hash` lists that the Wazuh rules look up | |
| + | against are generated here. | |
| + | ||
| + |  | |
| + | ||
| + | ## What it does | |
| + | ||
| + | - **Collects** from ThreatFox, Feodo Tracker, URLhaus, AlienVault OTX, and OpenPhish, | |
| + | covering IPs, domains, URLs, file hashes, phishing, exploit-kit and malware-download | |
| + | URLs, C2 infrastructure, and (optionally) leaked credentials. | |
| + | - **Normalizes** every source into one indicator model and **deduplicates** across | |
| + | feeds - an IP seen in both ThreatFox and Feodo becomes one indicator carrying both | |
| + | sources, the higher confidence, and the union of techniques. | |
| + | - **Extracts and dedups TTPs**: ATT&CK technique IDs come straight from OTX pulses and | |
| + | are inferred from malware family and threat type for the feeds that don't carry them, | |
| + | then collapsed into a ranked coverage list. | |
| + | - **Generates Wazuh rules**: CDB lists per indicator type plus an XML ruleset whose | |
| + | list-lookup rules are tagged with the dominant ATT&CK techniques for that bucket. | |
| + | - **Gates on approval**: every run produces a candidate bundle and emails the analyst a | |
| + | signed, time-limited review link. Rules are written to Wazuh only when the analyst | |
| + | approves. | |
| + | ||
| + | ```mermaid | |
| + | flowchart LR | |
| + | F[ThreatFox / Feodo / URLhaus<br/>OTX / OpenPhish] --> N[normalize] | |
| + | N --> D[deduplicate] | |
| + | D --> C{confidence<br/>>= threshold} | |
| + | C --> T[extract + dedup TTPs] | |
| + | T --> G[generate CDB lists<br/>+ Wazuh rules] | |
| + | G --> M[email analyst<br/>signed review link] | |
| + | M -->|approve| W[promote to Wazuh] | |
| + | M -->|reject| X[discard] | |
| + | ``` | |
| + | ||
| + | ## The approval gate | |
| + | ||
| + | The pipeline never deploys on its own. It writes the candidate to a staging | |
| + | directory and emails a review link. The analyst sees the diff against the last | |
| + | approved bundle, the indicator counts, and the ATT&CK coverage, then approves or | |
| + | rejects. | |
| + | ||
| + | | Review page | Bundle history | | |
| + | |---|---| | |
| + | |  |  | | |
| + | ||
| + | Short walkthrough of the flow: [docs/media/cti-approval-walkthrough.mp4](docs/media/cti-approval-walkthrough.mp4) | |
| + | ||
| + | The review link is an `itsdangerous` signed token with a TTL, so it can't be | |
| + | forged or replayed after it expires. | |
| + | ||
| + | ## Run the demo | |
| + | ||
| + | No API keys needed - the demo runs against the bundled fixtures: | |
| + | ||
| + | ```bash | |
| + | make install | |
| + | make demo # generates a bundle and writes the approval email to output/emails/ | |
| + | make serve # approval console on http://localhost:8080 | |
| + | ``` | |
| + | ||
| + | Open the email under `output/emails/`, click through to the review page, and | |
| + | approve. The promoted lists and rules land in `output/active/`. | |
| + | ||
| + | ## Run it for real | |
| + | ||
| + | ```bash | |
| + | cp .env.example .env # set CTI_APPROVAL_SECRET and feed API keys | |
| + | cp config.example.yaml config.yaml | |
| + | python -m cti.cli run -c config.yaml | |
| + | ``` | |
| + | ||
| + | ThreatFox and OTX need free API keys (`THREATFOX_AUTH_KEY`, `OTX_API_KEY`); Feodo, | |
| + | URLhaus, and OpenPhish are keyless. Set `email.backend: smtp` and the SMTP settings | |
| + | in `config.yaml` to send the approval mail to a real inbox. Point `wazuh_etc_dir` at | |
| + | the manager's `/var/ossec/etc` and approval will write the lists and rules straight | |
| + | into place. | |
| + | ||
| + | Schedule it with the systemd timer in `deploy/` (hourly by default) or any cron. | |
| + | ||
| + | ## Layout | |
| + | ||
| + | ``` | |
| + | src/cti/feeds/ one connector per source, each with a pure parse() | |
| + | src/cti/dedup.py cross-feed merge | |
| + | src/cti/ttp.py ATT&CK extraction + coverage report | |
| + | src/cti/rules.py CDB list + Wazuh XML generation | |
| + | src/cti/approval.py signed tokens + email rendering + SMTP | |
| + | src/cti/pipeline.py orchestration, candidate staging, promotion | |
| + | src/cti/web.py Flask approval console | |
| + | fixtures/ sample feed payloads for the demo and tests | |
| + | tests/ pytest suite (feeds, dedup, ttp, rules, approval, web) | |
| + | ``` | |
| + | ||
| + | ## Tests | |
| + | ||
| + | ```bash | |
| + | make test | |
| + | ``` | |
| + | ||
| + | Connectors are split into `fetch` and `parse`, so the suite runs the real parsing | |
| + | logic against fixtures with no network. Coverage includes cross-feed dedup, TTP | |
| + | extraction, rule generation well-formedness, token signing/expiry, and the full | |
| + | approve/reject path through the web app. | |
| + | ||
| + | See [docs/architecture.md](docs/architecture.md) for the data model and | |
| + | [docs/approval-flow.md](docs/approval-flow.md) for the gate in detail. |
| @@ -0,0 +1,64 @@ | ||
| + | # Approval flow | |
| + | ||
| + | The gate is the point of the project, so it gets its own document. | |
| + | ||
| + | ## 1. Bundle generated | |
| + | ||
| + | A pipeline run produces a candidate bundle and stages it under | |
| + | `output/candidates/<bundle-id>/`: | |
| + | ||
| + | ``` | |
| + | lists/cti-malicious-ip | |
| + | lists/cti-malicious-domain | |
| + | lists/cti-malicious-url | |
| + | lists/cti-malware-hash | |
| + | lists/cti-leaked-email | |
| + | local_cti_rules.xml | |
| + | ttp_coverage.md | |
| + | manifest.json | |
| + | keys.json | |
| + | ``` | |
| + | ||
| + | `manifest.json` carries the counts, the technique list, and the diff against the | |
| + | last approved bundle (added / removed / unchanged). | |
| + | ||
| + | ## 2. Analyst emailed | |
| + | ||
| + | The email summarizes the bundle - new vs. aged-out indicators, counts by type, and | |
| + | the extracted ATT&CK techniques - and links to the review page. The link embeds an | |
| + | `itsdangerous` signed token: | |
| + | ||
| + | ``` | |
| + | URLSafeTimedSerializer(secret, salt="cti-rule-approval").dumps({"bundle_id": ...}) | |
| + | ``` | |
| + | ||
| + | The token is signed with `CTI_APPROVAL_SECRET` and carries a TTL (`token_ttl`, | |
| + | default 24h). It can't be forged without the secret and stops working once it | |
| + | expires, so a stale email can't approve an old bundle. | |
| + | ||
| + | The backend is pluggable: `file` (writes the rendered email to disk, used by the | |
| + | demo), `console`, or `smtp` for a real inbox. | |
| + | ||
| + | ## 3. Review | |
| + | ||
| + | The review page renders the manifest - the diff cards, indicators by type, the | |
| + | generated CDB lists and their sizes, and the full ATT&CK coverage table. The analyst | |
| + | has the whole picture without leaving the page. | |
| + | ||
| + | ## 4. Decision | |
| + | ||
| + | - **Approve** → `promote` copies the candidate into `output/active/`, updates the | |
| + | baseline used for the next diff, and (when `wazuh_etc_dir` is set) writes the lists | |
| + | and rules into the Wazuh manager. A reload picks them up. | |
| + | - **Reject** → a `REJECTED` marker with the optional reason is written to the | |
| + | candidate directory and nothing is deployed. | |
| + | ||
| + | Either way the decision is recorded and the bundle's status shows on the dashboard. | |
| + | ||
| + | ## Token, not session | |
| + | ||
| + | There's deliberately no login. The signed link is the authorization - it's what was | |
| + | emailed to the analyst, it's scoped to one bundle, and it expires. That keeps the | |
| + | service stateless and means it can sit behind a reverse proxy on the SOC network | |
| + | without its own user store. For a multi-analyst setup you'd put SSO in front of it; | |
| + | the token model doesn't change. |
| @@ -0,0 +1,79 @@ | ||
| + | # Architecture | |
| + | ||
| + | ## Indicator model | |
| + | ||
| + | Everything from every feed is normalized into one `Indicator`: | |
| + | ||
| + | ``` | |
| + | type ip | domain | url | sha256 | md5 | sha1 | email | |
| + | value the indicator itself | |
| + | source comma-joined list of feeds that reported it | |
| + | threat_type botnet_cc | phishing | payload_delivery | ... | |
| + | confidence 0-100 | |
| + | malware family name when known | |
| + | techniques list of ATT&CK technique IDs | |
| + | tags free-form labels from the source | |
| + | ``` | |
| + | ||
| + | A feed connector does two things: `fetch_raw()` talks to the API, and `parse()` | |
| + | turns the raw payload into `Indicator` objects. They're separate so the parsing | |
| + | logic can be tested against fixtures without a network, and so adding a feed is | |
| + | just one new `parse()`. | |
| + | ||
| + | ## Deduplication | |
| + | ||
| + | Feeds overlap heavily - the same C2 IP shows up in ThreatFox and Feodo, the same | |
| + | domain in OTX and URLhaus. Dedup keys on `(type, lowercased value)` and merges: | |
| + | ||
| + | - confidence becomes the max across sources | |
| + | - techniques and tags become the union | |
| + | - sources are concatenated, so provenance is preserved | |
| + | - malware family and first-seen are taken from the first source that had them | |
| + | ||
| + | The result is one record per indicator that's stronger than any single feed's view | |
| + | of it. | |
| + | ||
| + | ## TTP extraction | |
| + | ||
| + | Techniques come from three places, in order of trust: | |
| + | ||
| + | 1. **Directly from the feed** - OTX pulses carry `attack_ids`, which are used as-is. | |
| + | 2. **From the malware family** - a CobaltStrike or AgentTesla indicator maps to that | |
| + | family's common techniques (`src/cti/mitre.py`). | |
| + | 3. **From the threat type** - a phishing URL maps to T1566.002 / T1204.001 even when | |
| + | no family is named. | |
| + | ||
| + | Extraction then collapses these across all indicators into a ranked | |
| + | `Technique` list with per-technique indicator counts and contributing sources, which | |
| + | becomes the coverage report attached to every bundle. | |
| + | ||
| + | ## Rule generation | |
| + | ||
| + | Two artifacts come out of a bundle: | |
| + | ||
| + | **CDB lists** - one file per indicator type (`cti-malicious-ip`, | |
| + | `cti-malicious-domain`, `cti-malicious-url`, `cti-malware-hash`, `cti-leaked-email`), | |
| + | in Wazuh's `key:value` format where the value is the malware family or threat type. | |
| + | These are the actual lookup tables. | |
| + | ||
| + | **XML rules** - list-lookup rules (one per type present) that fire when a field | |
| + | matches an entry in the corresponding CDB list. Each rule is tagged with the | |
| + | techniques that dominate that bucket, so an alert arrives already mapped to ATT&CK. | |
| + | Rule IDs start at a configurable base (default 100300) to stay clear of the bundled | |
| + | Wazuh ruleset and the lab's hand-written rules. | |
| + | ||
| + | ## Staging and promotion | |
| + | ||
| + | `run` writes a candidate under `output/candidates/<bundle-id>/` with the lists, the | |
| + | rules, the coverage report, and a manifest carrying the diff against the last | |
| + | approved bundle. Nothing is active yet. `promote` (called when the analyst approves) | |
| + | copies the candidate into `output/active/`, records the indicator set as the new | |
| + | baseline for the next diff, and - if `wazuh_etc_dir` is configured - writes the lists | |
| + | and rules into the Wazuh manager's directories. | |
| + | ||
| + | ## Why a human in the loop | |
| + | ||
| + | Auto-generated detections from public feeds are a false-positive risk: a shared | |
| + | CDN IP, a sinkholed domain, a hash collision in a list. The approval gate means a | |
| + | bad bundle is a rejected email, not a pager storm. The cost is a few minutes of | |
| + | analyst time per run, which is cheap next to chasing phantom alerts across the fleet. |