Zion Boggan
repos/Oversight/docs/SPEC.md
zionboggan.com ↗
409 lines · markdown
History for this file →
1
# OVERSIGHT Protocol Specification
2
 
3
**Sealed Entity, Notarized Trust, Integrity & Evidence Layer**
4
 
5
Version 0.5 - Draft - April 2026
6
 
7
---
8
 
9
## 1. Status
10
 
11
This document is a draft specification for an open protocol for data provenance, attribution, and leak detection. It is intended for eventual submission as a standards-track RFC following independent cryptographic review.
12
 
13
## 2. Goals and non-goals
14
 
15
### 2.1 Goals
16
 
17
The protocol MUST:
18
 
19
- Produce a file container format (`.sealed`) that wraps arbitrary payloads in an authenticated, recipient-bound cryptographic envelope.
20
- Allow post-quantum cryptographic agility without breaking existing sealed files.
21
- Bind every sealed file to a specific recipient identity via a signed manifest.
22
- Carry per-recipient watermarking identifiers that survive plaintext escape.
23
- Carry per-recipient passive beacon tokens that fire on open via standard rendering behaviors (DNS resolution, image fetch, certificate check) without executing code on the reader.
24
- Support distributed, jurisdiction-aware attribution registries.
25
- Produce evidence artifacts suitable as the foundation of a court-admissible chain-of-custody report.
26
- Be format-agnostic: the payload is opaque bytes; the protocol does not care whether it wraps DOCX, PDF, MP4, JSON, or raw bytes.
27
- Be open, reviewable, and free of proprietary dependencies.
28
 
29
### 2.2 Non-goals
30
 
31
The protocol does NOT:
32
 
33
- Execute code of any kind on the reader's machine. No active payloads. No RATs.
34
- Prevent all leaks. Plaintext, once decrypted, can be retyped, photographed, or OCR'd. The protocol's defense is attribution, not prevention.
35
- Provide DRM in the film-industry sense (playback restrictions, output protection). It provides attribution and revocation.
36
- Authenticate the truth of content. Like C2PA, OVERSIGHT proves who signed what for whom; it does not verify the claims in the content itself.
37
 
38
## 3. Threat model
39
 
40
### 3.1 Assumptions
41
 
42
- The issuer controls its signing keys and operates a registry (or delegates to a federated operator).
43
- The intended recipient controls its decryption keys.
44
- The network between recipient and registry is untrusted but standard TLS is available.
45
 
46
### 3.2 Adversaries
47
 
48
The protocol defends against:
49
 
50
| Adversary | Capability | Defense |
51
|-----------|------------|---------|
52
| Passive interceptor | Captures sealed file in transit | AEAD, recipient-bound DEK |
53
| Curious insider | Receives file, shares with third party | Per-recipient watermarking → attribution |
54
| Thief with wrong key | Steals sealed file, has no decryption key | ECDH/KEM unwrap fails |
55
| Tamperer | Modifies ciphertext or manifest | AEAD tag + manifest signature + content-hash verify |
56
| Format-conversion attacker | Decrypts, converts to PDF/screenshot, posts plaintext | Multi-layer watermarking; attribution via registry match |
57
| Metadata-stripping attacker | Re-serializes file to remove marks | Defeats L2+; L1 zero-width and L3 semantic marks survive |
58
| Nation-state with quantum computer (future) | Decrypts classical ciphertexts | Hybrid mode: ML-KEM + X25519 |
59
 
60
The protocol does NOT defend against:
61
 
62
- The fully-airgapped attacker who also OCR/retypes the document and distributes only the retyped copy. (Semantic/synonym watermarks are the only defense; they are probabilistic.)
63
- An attacker who compromises the issuer's signing key. (Key rotation and revocation logs are the mitigation.)
64
- An attacker who owns the registry infrastructure. (Use a federated/transparency-log registry; mitigate with jurisdictional profiles.)
65
 
66
## 4. Cryptographic primitives
67
 
68
### 4.1 Algorithm suites
69
 
70
Every sealed file declares an `suite` in its manifest. Implementations MUST reject unknown suites.
71
 
72
#### 4.1.1 `OSGT-CLASSIC-v1` (suite_id = 1)
73
 
74
- Key agreement: X25519 (RFC 7748)
75
- KDF: HKDF-SHA256 (RFC 5869), info = `"oversight-v1-dek-wrap"`
76
- AEAD: XChaCha20-Poly1305 (draft-irtf-cfrg-xchacha)
77
- Signature: Ed25519 (RFC 8032)
78
- Hash: SHA-256
79
 
80
#### 4.1.2 `OSGT-HYBRID-v1` (suite_id = 2)
81
 
82
All primitives of CLASSIC-v1, plus:
83
 
84
- KEM: ML-KEM-768 (FIPS 203), combined with X25519 using hybrid KDF
85
- Signature: ML-DSA-65 (FIPS 204), combined with Ed25519 (dual signatures)
86
 
87
Hybrid key establishment combines the two shared secrets:
88
 
89
```
90
hybrid_ss = HKDF-SHA256(
91
    salt = "oversight-hybrid-v1",
92
    ikm  = x25519_ss || mlkem_ss,
93
    info = "oversight-hybrid-dek-wrap",
94
    len  = 32
95
)
96
```
97
 
98
Hybrid signatures attach both signatures to the manifest. Verification requires BOTH to validate.
99
 
100
#### 4.1.3 `OSGT-HW-P256-v1` (suite_id = 3)
101
 
102
For recipients whose private key lives in a PIV-compatible hardware token
103
(YubiKey, Nitrokey, OnlyKey). The token performs the ECDH on-device; the
104
private scalar never leaves the device.
105
 
106
- Key agreement: ECDH on NIST P-256 (FIPS 186-5 / SEC1)
107
- Recipient public key: SEC1 uncompressed encoding (65 bytes,
108
  `0x04 || X || Y`); recorded in the manifest as `recipient.p256_pub` (hex)
109
- KDF: HKDF-SHA256 (RFC 5869), `salt = None`, `info = "oversight-hw-p256-v1-dek-wrap"`
110
- AEAD: XChaCha20-Poly1305, `aad = "oversight-hw-p256-dek"`
111
- Signature: Ed25519 (issuer); the recipient's hardware suite does not
112
  affect the issuer signature path
113
- Hash: SHA-256
114
 
115
The `wrapped_dek` JSON for this suite is:
116
 
117
```json
118
{
119
  "suite": "OSGT-HW-P256-v1",
120
  "ephemeral_pub": "<hex of SEC1 uncompressed P-256 ephemeral pubkey, 65 bytes>",
121
  "nonce": "<hex, 24 bytes>",
122
  "wrapped_dek": "<hex, AEAD ciphertext including 16-byte tag>"
123
}
124
```
125
 
126
The sender holds no hardware key. The ephemeral keypair is generated locally
127
in software; only the recipient's public key needs to come off the token
128
(typically via PKCS#11 `C_GetAttributeValue` once at recipient enrollment).
129
 
130
P-256 was chosen over X25519 for compatibility with the broadest set of PIV
131
deployments. PIV slots historically support only P-256 and P-384; YubiKey
132
5.7+ adds Curve25519 over the OpenPGP applet but PIV itself does not.
133
Cryptographic strength is unchanged; both X25519 and P-256 ECDH offer
134
~128-bit security.
135
 
136
### 4.2 Custom cryptography is PROHIBITED
137
 
138
Implementations MUST NOT introduce new cryptographic primitives. The suite identifiers are reserved; new suites may only be added via specification update after independent review.
139
 
140
## 5. Container format
141
 
142
### 5.1 Wire layout
143
 
144
All integers are unsigned big-endian.
145
 
146
```
147
offset  length    field              notes
148
------  --------  -----------------  ---------------------------------
149
0       6         magic              0x53 0x4E 0x54 0x4C 0x01 0x00  ("OSGT\x01\x00")
150
6       1         format_version     MUST be 0x01
151
7       1         suite_id           1 = CLASSIC_v1, 2 = HYBRID_v1, 3 = HW_P256_v1
152
8       4         manifest_len       length of manifest JSON in bytes
153
12      M         manifest           canonical JSON (signed)
154
12+M    4         wrapped_dek_len
155
...     W         wrapped_dek        JSON; per-suite shape (see 5.2)
156
...     24        aead_nonce         XChaCha20-Poly1305 nonce
157
...     4         ciphertext_len
158
...     C         ciphertext         AEAD output, includes 16-byte tag
159
```
160
 
161
Implementations MUST reject any `.sealed` file whose unsigned `suite_id`
162
header does not match the signed `manifest.suite` value, and MUST reject
163
trailing bytes after the declared ciphertext region.
164
 
165
### 5.2 `wrapped_dek` JSON shape per suite
166
 
167
The `wrapped_dek` byte range holds a canonical-JSON object whose fields
168
depend on the manifest's declared `suite`. All byte values are lowercase
169
hex unless otherwise noted.
170
 
171
#### `OSGT-CLASSIC-v1`
172
 
173
```json
174
{
175
  "ephemeral_pub": "<32-byte X25519 ephemeral public key>",
176
  "nonce":         "<24-byte XChaCha20-Poly1305 nonce>",
177
  "wrapped_dek":   "<DEK ciphertext + 16-byte tag>"
178
}
179
```
180
 
181
KDF: `HKDF-SHA256(salt=None, ikm=ss_x, info="oversight-v1-dek-wrap", L=32)`.
182
AAD on `wrapped_dek`: `"oversight-dek"`.
183
 
184
#### `OSGT-HYBRID-v1`
185
 
186
```json
187
{
188
  "suite":                "OSGT-HYBRID-v1",
189
  "x25519_ephemeral_pub": "<32-byte X25519 ephemeral public key>",
190
  "mlkem_ciphertext":     "<1088-byte ML-KEM-768 ciphertext>",
191
  "nonce":                "<24-byte XChaCha20-Poly1305 nonce>",
192
  "wrapped_dek":          "<DEK ciphertext + 16-byte tag>"
193
}
194
```
195
 
196
KDF: `HKDF-SHA256(salt=None, ikm=ss_x || ss_pq || x25519_eph_pub || mlkem_ct,
197
info="oversight-hybrid-v1-dek-wrap", L=32)`. AAD on `wrapped_dek`:
198
`"oversight-hybrid-dek"`. The X-wing-style binding over both shared
199
secrets and both ephemeral inputs prevents any future construction in
200
which an attacker could substitute a valid-but-different ciphertext.
201
 
202
#### `OSGT-HW-P256-v1`
203
 
204
```json
205
{
206
  "suite":         "OSGT-HW-P256-v1",
207
  "ephemeral_pub": "<65-byte SEC1 uncompressed P-256 ephemeral public key>",
208
  "nonce":         "<24-byte XChaCha20-Poly1305 nonce>",
209
  "wrapped_dek":   "<DEK ciphertext + 16-byte tag>"
210
}
211
```
212
 
213
KDF: `HKDF-SHA256(salt=None, ikm=ss_p256, info="oversight-hw-p256-v1-dek-wrap",
214
L=32)`. AAD on `wrapped_dek`: `"oversight-hw-p256-dek"`.
215
 
216
A polymorphic open implementation MUST dispatch on the unsigned
217
`suite_id` header (after the manifest-suite consistency check), parse
218
the corresponding shape, and reject any envelope whose ephemeral public
219
key length does not match the suite's curve. Mixing keys across suites
220
is a misuse and MUST be rejected rather than silently produce a derived
221
shared secret.
222
 
223
### 5.2 Manifest
224
 
225
The manifest is canonical JSON per RFC 8785 (JCS: keys sorted by UTF-16 code unit, no whitespace, non-ASCII emitted as raw UTF-8). Required fields:
226
 
227
- `file_id` (UUID v4)
228
- `issued_at` (unix seconds, UTC)
229
- `version` (`"OVERSIGHT-v1"`)
230
- `suite` (suite identifier string)
231
- `content_hash` (hex SHA-256 of plaintext)
232
- `canonical_content_hash` (hex SHA-256 of the source bytes before
233
  L3/L2/L1 watermarking; used to resolve wording disputes)
234
- `size_bytes` (plaintext length)
235
- `issuer_id` (string)
236
- `issuer_ed25519_pub` (hex)
237
- `recipient` (object: `recipient_id`, `x25519_pub`, optional `ed25519_pub`)
238
- `signature_ed25519` (hex, Ed25519 over canonical bytes without signature fields)
239
 
240
Optional fields:
241
 
242
- `original_filename`, `content_type`
243
- `watermarks` (array of `{layer, mark_id}`)
244
- `beacons` (array of beacon descriptors)
245
- `policy` (`not_after`, `max_opens`, `jurisdiction`, `registry_url`, `require_attestation`)
246
- `l3_policy` (object describing L3 mode, document class, disclosure state,
247
  and safety rationale)
248
- `signature_ml_dsa` (hex, for HYBRID suites)
249
 
250
### 5.3 DEK wrapping
251
 
252
A fresh 32-byte DEK is generated per file. The wrapping procedure for CLASSIC-v1:
253
 
254
1. Generate ephemeral X25519 keypair `(eph_sk, eph_pk)`.
255
2. Compute `ss = X25519(eph_sk, recipient_x25519_pub)`.
256
3. Derive `kek = HKDF-SHA256(ss, salt=nil, info="oversight-v1-dek-wrap", len=32)`.
257
4. Encrypt DEK: `(nonce, ct) = XChaCha20-Poly1305(kek, DEK, aad="oversight-dek")`.
258
5. Store `{eph_pk, nonce, ct}` as `wrapped_dek`.
259
 
260
### 5.4 AEAD binding
261
 
262
The ciphertext AEAD takes `AAD = content_hash` (the hex string from the manifest). This binds the ciphertext to the signed manifest; an attacker cannot swap ciphertexts between manifests without breaking the AEAD tag.
263
 
264
### 5.5 Post-decrypt verification
265
 
266
After decryption, the implementation MUST verify that `SHA-256(plaintext) == manifest.content_hash`. If not, discard the plaintext.
267
 
268
## 6. Watermarking
269
 
270
Watermarking is optional but RECOMMENDED. Each applied layer registers a
271
`mark_id` in the manifest. L3 semantic watermarking changes visible prose and
272
is therefore opt-in for wording-sensitive classes. Implementations MUST
273
default L3 off for legal documents, regulatory filings, technical
274
specifications, source code, SQL, logs, and structured data unless the user
275
explicitly enables and acknowledges the textual change.
276
 
277
### 6.1 Layer identifiers
278
 
279
- `L1_zero_width` - zero-width unicode characters scattered through text payloads
280
- `L2_whitespace` - trailing space vs tab at line endings
281
- `L3_synonyms` - legacy synonym-class rotation identifier
282
- `L3_semantic_full` - guarded semantic marks over eligible prose regions
283
- `L3_semantic_boilerplate` - guarded semantic marks limited to header/footer/cover-page regions
284
- `L4_dct_visual` - reserved; for image payloads
285
- `L5_layout` - reserved; for PDF/document layout perturbation
286
 
287
### 6.2 Mark IDs
288
 
289
Mark IDs are 64-bit random values. Collision probability at 2^32 issued marks is ~2^-32.
290
 
291
### 6.3 Recovery
292
 
293
A leaked plaintext is scanned by all supported layer extractors. Each recovered `mark_id` is queried against the registry. A match returns `(file_id, recipient_id, issuer_id)`.
294
 
295
Implementations SHOULD use multiple layers so that defeating one does not defeat attribution.
296
 
297
## 7. Beacons
298
 
299
### 7.1 Types
300
 
301
| Kind       | Channel | Triggered by                                          |
302
|------------|---------|-------------------------------------------------------|
303
| `dns`      | DNS     | Document rendering, network-aware readers, preview pipelines |
304
| `http_img` | HTTPS   | `<img>` tags in HTML/Office/PDF/SVG                    |
305
| `ocsp`     | HTTPS   | Certificate revocation checks                          |
306
| `license`  | HTTPS   | Explicit license-server check (policy-enforced)       |
307
 
308
### 7.2 Token format
309
 
310
Each beacon carries a 128-bit unguessable `token_id`. The registry maps `token_id → (file_id, recipient_id, issuer_id)`.
311
 
312
### 7.3 Passive-only requirement
313
 
314
Beacons MUST NOT cause code execution on the reader. A beacon is a network callback that a standard renderer makes naturally; it does not require a plugin, macro, or active payload.
315
 
316
## 8. Registry
317
 
318
### 8.1 Endpoints
319
 
320
A compliant registry exposes:
321
 
322
| Method | Path                       | Purpose                                 |
323
|--------|----------------------------|-----------------------------------------|
324
| POST   | `/register`                | Issuer registers a file's beacons+marks |
325
| GET    | `/p/{token_id}.png`        | HTTP image beacon receiver              |
326
| GET    | `/r/{token_id}`            | OCSP-style beacon receiver              |
327
| GET    | `/v/{token_id}`            | License-check beacon receiver           |
328
| POST   | `/attribute`               | Query by token_id or mark_id            |
329
| GET    | `/evidence/{file_id}`      | Assemble evidence bundle                |
330
 
331
### 8.2 Qualified timestamps
332
 
333
Production registries MUST timestamp events via RFC 3161 against at least one qualified Time Stamping Authority (TSA). Evidence bundles MUST include the TimeStampToken(s).
334
 
335
### 8.3 Transparency log
336
 
337
Production registries SHOULD chain events into an append-only transparency log (Sigstore-style Merkle log) so that registry operators cannot fabricate or suppress events undetected.
338
 
339
### 8.4 Jurisdictional profiles
340
 
341
Registries MUST publish a jurisdictional profile declaring:
342
 
343
- Data residency (where event logs are stored)
344
- Permitted field collection per event (IP, UA, geolocation, etc.)
345
- Retention period
346
- Cross-border data-sharing policy
347
 
348
The manifest `policy.jurisdiction` MUST match the registry's profile or the seal MUST be rejected.
349
 
350
## 9. Evidence bundles
351
 
352
An evidence bundle is a JSON artifact containing:
353
 
354
1. The original signed manifest
355
2. All registered beacons and watermarks
356
3. Chronologically ordered event log
357
4. Qualified timestamps for each event
358
5. Registry's own signature over the bundle
359
6. Transparency-log inclusion proofs
360
 
361
The bundle is the foundation for a forensic report per ISO/IEC 27037. A court-admissible final report requires additional human-in-the-loop procedures: examiner qualifications, methodology documentation, and proper preservation of the original blob.
362
 
363
## 10. Security considerations
364
 
365
### 10.1 Key compromise
366
 
367
- Issuer key compromise allows forged manifests for the compromise window. Mitigation: short-lived issuer keys, certificate transparency, a revocation list.
368
- Recipient key compromise allows decryption of all files ever sealed for that recipient. Mitigation: per-purpose recipient keys, forward-secret variants (future work).
369
 
370
### 10.2 Replay
371
 
372
Ciphertext is bound to manifest via AEAD AAD. Manifest is signed and uniquely identified by `file_id`. Replay of a full sealed blob is equivalent to possession of the blob.
373
 
374
### 10.3 Side channels
375
 
376
Implementations MUST use constant-time implementations for all cryptographic primitives. Watermark-embedding timing may leak whether a recipient is being marked; embed times SHOULD be bounded.
377
 
378
### 10.4 Metadata exposure
379
 
380
The manifest is not encrypted. An attacker who captures a sealed blob learns the recipient, issuer, beacons, and watermark IDs. This is intentional: third parties (legal discovery, compliance auditors) must be able to inspect the metadata without holding the decryption key. Sensitive fields SHOULD be hashed or omitted from the manifest if their disclosure is unacceptable.
381
 
382
### 10.5 Traffic analysis of beacons
383
 
384
Beacon callbacks reveal that a sealed file was opened. In hostile environments an attacker who blocks outbound traffic will suppress beacon callbacks. The protocol does not claim to defeat such an attacker; watermarking provides the post-escape attribution path.
385
 
386
## 11. IANA considerations
387
 
388
Reserved media type: `application/vnd.oversight.sealed`
389
Reserved file extension: `.sealed`
390
Reserved URN namespace: `urn:oversight:file:<file_id>`
391
 
392
## 12. References
393
 
394
- FIPS 203: Module-Lattice-Based Key-Encapsulation Mechanism
395
- FIPS 204: Module-Lattice-Based Digital Signature Standard
396
- RFC 7748: Elliptic Curves for Security (X25519)
397
- RFC 8032: Edwards-Curve Digital Signature Algorithm (EdDSA)
398
- RFC 5869: HKDF
399
- RFC 3161: Time-Stamp Protocol (TSP)
400
- ISO/IEC 27037: Guidelines for identification, collection, acquisition and preservation of digital evidence
401
- C2PA 2.3: Content Credentials specification
402
- draft-irtf-cfrg-xchacha: XChaCha20-Poly1305
403
 
404
## 13. Appendix A - Test vectors (normative)
405
 
406
Cross-language conformance scripts live at `oversight-rust/tests/conformance_*.sh`
407
and assert byte-identical seal/open and Rekor DSSE/PAE between the Python
408
reference and the Rust port. Implementations SHOULD run them on every change
409
and SHOULD add published byte-exact vectors for every suite they ship.