| 1 | # SIEM integration |
| 2 | |
| 3 | Oversight registries record every beacon callback (DNS, HTTP pixel, OCSP, |
| 4 | license check) in the local SQLite `events` table and append a signed entry |
| 5 | to the transparency log. Security teams running Splunk, Microsoft Sentinel, |
| 6 | or an Elastic Common Schema stack want that data in the same pipeline as |
| 7 | the rest of their telemetry. The `oversight_core.siem` module and the |
| 8 | `oversight siem export` CLI handle the formatting and the minimal transport |
| 9 | that gets events from the registry to the SIEM. |
| 10 | |
| 11 | The module is pure Python and stdlib-only for the formatters. HTTP |
| 12 | transport reuses the `httpx` client already in the dependency set. No SIEM |
| 13 | vendor SDK is required, and no vendor-specific credential lives in the |
| 14 | Oversight process unless the operator configures one. |
| 15 | |
| 16 | ## Event shape |
| 17 | |
| 18 | One normalized record per row of the `events` table. The registry |
| 19 | identifier is typically the registry's own Ed25519 public key hex so |
| 20 | federated operators are distinguishable in SIEM dashboards. |
| 21 | |
| 22 | | Field | Source column | |
| 23 | |-----------------------|--------------------------| |
| 24 | | `event_id` | `events.id` | |
| 25 | | `event_kind` | `events.kind` (`dns`, `http_img`, `ocsp`, `license`) | |
| 26 | | `occurred_unix` | `events.timestamp` | |
| 27 | | `occurred_at` | derived RFC 3339 UTC | |
| 28 | | `registry_id` | caller-supplied | |
| 29 | | `token_id` | `events.token_id` | |
| 30 | | `file_id` | `events.file_id` | |
| 31 | | `recipient_id` | `events.recipient_id` | |
| 32 | | `issuer_id` | `events.issuer_id` | |
| 33 | | `source_ip` | `events.source_ip` | |
| 34 | | `user_agent` | `events.user_agent` | |
| 35 | | `qualified_timestamp` | `events.qualified_timestamp` (RFC 3161) | |
| 36 | | `tlog_index` | `events.tlog_index` | |
| 37 | | `extra` | `events.extra` (JSON) | |
| 38 | |
| 39 | ## CLI |
| 40 | |
| 41 | ``` |
| 42 | oversight siem export \ |
| 43 | --db /var/lib/oversight/registry.sqlite \ |
| 44 | --format splunk|ecs|sentinel \ |
| 45 | --registry-id <ed25519_pub hex or short id> \ |
| 46 | [--since <unix_ts>] \ |
| 47 | [--limit N] \ |
| 48 | [--output -|/path/to/file.jsonl|https://collector.example/endpoint] \ |
| 49 | [--header 'Authorization: Splunk <hec-token>'] |
| 50 | ``` |
| 51 | |
| 52 | The default output is `-` (stdout, JSON lines). Forwarders like the Splunk |
| 53 | Universal Forwarder, Azure Monitor Agent, or Filebeat can tail the file |
| 54 | output directly; no Oversight-side credential is required. When the |
| 55 | `--output` is an HTTP URL, the CLI POSTs a JSON array and fails loudly on |
| 56 | non-2xx so a backoff wrapper can retry. |
| 57 | |
| 58 | ### Splunk HTTP Event Collector |
| 59 | |
| 60 | Deploy the events over HEC: |
| 61 | |
| 62 | ``` |
| 63 | oversight siem export --db registry.sqlite --registry-id $REG \ |
| 64 | --format splunk \ |
| 65 | --output https://splunk.example:8088/services/collector/event \ |
| 66 | --header 'Authorization: Splunk 00000000-0000-0000-0000-000000000000' |
| 67 | ``` |
| 68 | |
| 69 | `source` and `sourcetype` default to `oversight:registry` and |
| 70 | `oversight:beacon`. Override with `--splunk-source`, `--splunk-sourcetype`, |
| 71 | and `--splunk-index` to match your deployment's field extraction. |
| 72 | |
| 73 | ### Microsoft Sentinel (Log Analytics Data Collector API) |
| 74 | |
| 75 | The Data Collector API requires an HMAC-SHA256 `Authorization` header. |
| 76 | `oversight_core.siem.sentinel_authorization` computes it; the CLI does not |
| 77 | yet sign requests on your behalf because the signing window depends on the |
| 78 | RFC 1123 `x-ms-date` header, which must match the body length exactly. |
| 79 | For production, write the records to a file and have Azure Monitor Agent |
| 80 | pick them up, or wrap the signing in a small adapter: |
| 81 | |
| 82 | ```python |
| 83 | from oversight_core.siem import ( |
| 84 | iter_registry_events, export_events, HTTPJSONSink, sentinel_authorization, |
| 85 | ) |
| 86 | from datetime import datetime, timezone |
| 87 | import json |
| 88 | |
| 89 | events = list(iter_registry_events("registry.sqlite", registry_id=REG)) |
| 90 | # Pre-format so we know the content length. |
| 91 | batch = [e for e in events] # ... format via to_sentinel |
| 92 | body = json.dumps([...]).encode("utf-8") |
| 93 | date = datetime.now(timezone.utc).strftime("%a, %d %b %Y %H:%M:%S GMT") |
| 94 | auth = sentinel_authorization( |
| 95 | workspace_id=WORKSPACE_ID, |
| 96 | shared_key_b64=SHARED_KEY, |
| 97 | content_length=len(body), |
| 98 | date_rfc1123=date, |
| 99 | ) |
| 100 | sink = HTTPJSONSink( |
| 101 | f"https://{WORKSPACE_ID}.ods.opinsights.azure.com/api/logs?api-version=2016-04-01", |
| 102 | headers={ |
| 103 | "Authorization": auth, |
| 104 | "Log-Type": "Oversight", |
| 105 | "x-ms-date": date, |
| 106 | "time-generated-field": "TimeGenerated", |
| 107 | }, |
| 108 | ) |
| 109 | ``` |
| 110 | |
| 111 | The KQL-friendly custom log name is `Oversight_CL` after Sentinel ingests |
| 112 | the first batch. Each beacon kind surfaces as a value of the `BeaconKind` |
| 113 | column, so a single `Oversight_CL | where BeaconKind == "dns"` query pulls |
| 114 | every DNS callback. Joins against your existing identity tables key off |
| 115 | `RecipientId` or `IssuerId`. |
| 116 | |
| 117 | ### Elastic Common Schema |
| 118 | |
| 119 | ECS 8.x-compatible records are ready to index into Elasticsearch or ship |
| 120 | through Filebeat. The schema sets `event.module = "oversight"` and |
| 121 | `event.dataset = "oversight.beacon"` so the Elastic Security app renders |
| 122 | the events without extra mapping work. |
| 123 | |
| 124 | ``` |
| 125 | oversight siem export --db registry.sqlite --registry-id $REG \ |
| 126 | --format ecs \ |
| 127 | --output /var/log/oversight/events.ndjson |
| 128 | ``` |
| 129 | |
| 130 | Point Filebeat at the file with the `ndjson` parser and a `fields_under_root: true` |
| 131 | processor that promotes the embedded `@timestamp`. The custom namespace at |
| 132 | `oversight.*` preserves the protocol-native fields for runtime fields or |
| 133 | Lens visualizations. |
| 134 | |
| 135 | ## Honest limits |
| 136 | |
| 137 | Absence of a beacon is not evidence of no leak. Corporate egress filtering, |
| 138 | air-gapped readers, and sandboxed previews suppress beacon callbacks. |
| 139 | Oversight records what it sees; SIEM alerting on the absence of beacons |
| 140 | needs a baseline and an explicit policy, not just the event stream. |
| 141 | `docs/security.md` and the research threat model cover the details. |
| 142 | |
| 143 | ## Fields you may want to rename on your side |
| 144 | |
| 145 | - `token_id` is the public beacon token, not an authentication token; renaming |
| 146 | to `beacon_token` in your dashboards avoids confusion with OAuth scopes. |
| 147 | - `file_id` is the Oversight-internal content identifier, not a hash. Map to |
| 148 | your DLP system's `document_id` or equivalent. |
| 149 | - `recipient_id` and `issuer_id` map to whatever identity scheme the Oversight |
| 150 | deployment uses (email, SSO subject, Ed25519 fingerprint). |