Zion Boggan
repos/Oversight/docs/SIEM.md
zionboggan.com ↗
150 lines · markdown
History for this file →
1
# SIEM integration
2
 
3
Oversight registries record every beacon callback (DNS, HTTP pixel, OCSP,
4
license check) in the local SQLite `events` table and append a signed entry
5
to the transparency log. Security teams running Splunk, Microsoft Sentinel,
6
or an Elastic Common Schema stack want that data in the same pipeline as
7
the rest of their telemetry. The `oversight_core.siem` module and the
8
`oversight siem export` CLI handle the formatting and the minimal transport
9
that gets events from the registry to the SIEM.
10
 
11
The module is pure Python and stdlib-only for the formatters. HTTP
12
transport reuses the `httpx` client already in the dependency set. No SIEM
13
vendor SDK is required, and no vendor-specific credential lives in the
14
Oversight process unless the operator configures one.
15
 
16
## Event shape
17
 
18
One normalized record per row of the `events` table. The registry
19
identifier is typically the registry's own Ed25519 public key hex so
20
federated operators are distinguishable in SIEM dashboards.
21
 
22
| Field                 | Source column            |
23
|-----------------------|--------------------------|
24
| `event_id`            | `events.id`              |
25
| `event_kind`          | `events.kind` (`dns`, `http_img`, `ocsp`, `license`) |
26
| `occurred_unix`       | `events.timestamp`       |
27
| `occurred_at`         | derived RFC 3339 UTC     |
28
| `registry_id`         | caller-supplied          |
29
| `token_id`            | `events.token_id`        |
30
| `file_id`             | `events.file_id`         |
31
| `recipient_id`        | `events.recipient_id`    |
32
| `issuer_id`           | `events.issuer_id`       |
33
| `source_ip`           | `events.source_ip`       |
34
| `user_agent`          | `events.user_agent`      |
35
| `qualified_timestamp` | `events.qualified_timestamp` (RFC 3161) |
36
| `tlog_index`          | `events.tlog_index`      |
37
| `extra`               | `events.extra` (JSON)    |
38
 
39
## CLI
40
 
41
```
42
oversight siem export \
43
  --db /var/lib/oversight/registry.sqlite \
44
  --format splunk|ecs|sentinel \
45
  --registry-id <ed25519_pub hex or short id> \
46
  [--since <unix_ts>] \
47
  [--limit N] \
48
  [--output -|/path/to/file.jsonl|https://collector.example/endpoint] \
49
  [--header 'Authorization: Splunk <hec-token>']
50
```
51
 
52
The default output is `-` (stdout, JSON lines). Forwarders like the Splunk
53
Universal Forwarder, Azure Monitor Agent, or Filebeat can tail the file
54
output directly; no Oversight-side credential is required. When the
55
`--output` is an HTTP URL, the CLI POSTs a JSON array and fails loudly on
56
non-2xx so a backoff wrapper can retry.
57
 
58
### Splunk HTTP Event Collector
59
 
60
Deploy the events over HEC:
61
 
62
```
63
oversight siem export --db registry.sqlite --registry-id $REG \
64
  --format splunk \
65
  --output https://splunk.example:8088/services/collector/event \
66
  --header 'Authorization: Splunk 00000000-0000-0000-0000-000000000000'
67
```
68
 
69
`source` and `sourcetype` default to `oversight:registry` and
70
`oversight:beacon`. Override with `--splunk-source`, `--splunk-sourcetype`,
71
and `--splunk-index` to match your deployment's field extraction.
72
 
73
### Microsoft Sentinel (Log Analytics Data Collector API)
74
 
75
The Data Collector API requires an HMAC-SHA256 `Authorization` header.
76
`oversight_core.siem.sentinel_authorization` computes it; the CLI does not
77
yet sign requests on your behalf because the signing window depends on the
78
RFC 1123 `x-ms-date` header, which must match the body length exactly.
79
For production, write the records to a file and have Azure Monitor Agent
80
pick them up, or wrap the signing in a small adapter:
81
 
82
```python
83
from oversight_core.siem import (
84
    iter_registry_events, export_events, HTTPJSONSink, sentinel_authorization,
85
)
86
from datetime import datetime, timezone
87
import json
88
 
89
events = list(iter_registry_events("registry.sqlite", registry_id=REG))
90
# Pre-format so we know the content length.
91
batch = [e for e in events]  # ... format via to_sentinel
92
body = json.dumps([...]).encode("utf-8")
93
date = datetime.now(timezone.utc).strftime("%a, %d %b %Y %H:%M:%S GMT")
94
auth = sentinel_authorization(
95
    workspace_id=WORKSPACE_ID,
96
    shared_key_b64=SHARED_KEY,
97
    content_length=len(body),
98
    date_rfc1123=date,
99
)
100
sink = HTTPJSONSink(
101
    f"https://{WORKSPACE_ID}.ods.opinsights.azure.com/api/logs?api-version=2016-04-01",
102
    headers={
103
        "Authorization": auth,
104
        "Log-Type": "Oversight",
105
        "x-ms-date": date,
106
        "time-generated-field": "TimeGenerated",
107
    },
108
)
109
```
110
 
111
The KQL-friendly custom log name is `Oversight_CL` after Sentinel ingests
112
the first batch. Each beacon kind surfaces as a value of the `BeaconKind`
113
column, so a single `Oversight_CL | where BeaconKind == "dns"` query pulls
114
every DNS callback. Joins against your existing identity tables key off
115
`RecipientId` or `IssuerId`.
116
 
117
### Elastic Common Schema
118
 
119
ECS 8.x-compatible records are ready to index into Elasticsearch or ship
120
through Filebeat. The schema sets `event.module = "oversight"` and
121
`event.dataset = "oversight.beacon"` so the Elastic Security app renders
122
the events without extra mapping work.
123
 
124
```
125
oversight siem export --db registry.sqlite --registry-id $REG \
126
  --format ecs \
127
  --output /var/log/oversight/events.ndjson
128
```
129
 
130
Point Filebeat at the file with the `ndjson` parser and a `fields_under_root: true`
131
processor that promotes the embedded `@timestamp`. The custom namespace at
132
`oversight.*` preserves the protocol-native fields for runtime fields or
133
Lens visualizations.
134
 
135
## Honest limits
136
 
137
Absence of a beacon is not evidence of no leak. Corporate egress filtering,
138
air-gapped readers, and sandboxed previews suppress beacon callbacks.
139
Oversight records what it sees; SIEM alerting on the absence of beacons
140
needs a baseline and an explicit policy, not just the event stream.
141
`docs/security.md` and the research threat model cover the details.
142
 
143
## Fields you may want to rename on your side
144
 
145
- `token_id` is the public beacon token, not an authentication token; renaming
146
  to `beacon_token` in your dashboards avoids confusion with OAuth scopes.
147
- `file_id` is the Oversight-internal content identifier, not a hash. Map to
148
  your DLP system's `document_id` or equivalent.
149
- `recipient_id` and `issuer_id` map to whatever identity scheme the Oversight
150
  deployment uses (email, SSO subject, Ed25519 fingerprint).