| 1 | # OVERSIGHT Protocol Specification |
| 2 | |
| 3 | **Sealed Entity, Notarized Trust, Integrity & Evidence Layer** |
| 4 | |
| 5 | Version 0.5 - Draft - April 2026 |
| 6 | |
| 7 | --- |
| 8 | |
| 9 | ## 1. Status |
| 10 | |
| 11 | This document is a draft specification for an open protocol for data provenance, attribution, and leak detection. It is intended for eventual submission as a standards-track RFC following independent cryptographic review. |
| 12 | |
| 13 | ## 2. Goals and non-goals |
| 14 | |
| 15 | ### 2.1 Goals |
| 16 | |
| 17 | The protocol MUST: |
| 18 | |
| 19 | - Produce a file container format (`.sealed`) that wraps arbitrary payloads in an authenticated, recipient-bound cryptographic envelope. |
| 20 | - Allow post-quantum cryptographic agility without breaking existing sealed files. |
| 21 | - Bind every sealed file to a specific recipient identity via a signed manifest. |
| 22 | - Carry per-recipient watermarking identifiers that survive plaintext escape. |
| 23 | - Carry per-recipient passive beacon tokens that fire on open via standard rendering behaviors (DNS resolution, image fetch, certificate check) without executing code on the reader. |
| 24 | - Support distributed, jurisdiction-aware attribution registries. |
| 25 | - Produce evidence artifacts suitable as the foundation of a court-admissible chain-of-custody report. |
| 26 | - Be format-agnostic: the payload is opaque bytes; the protocol does not care whether it wraps DOCX, PDF, MP4, JSON, or raw bytes. |
| 27 | - Be open, reviewable, and free of proprietary dependencies. |
| 28 | |
| 29 | ### 2.2 Non-goals |
| 30 | |
| 31 | The protocol does NOT: |
| 32 | |
| 33 | - Execute code of any kind on the reader's machine. No active payloads. No RATs. |
| 34 | - Prevent all leaks. Plaintext, once decrypted, can be retyped, photographed, or OCR'd. The protocol's defense is attribution, not prevention. |
| 35 | - Provide DRM in the film-industry sense (playback restrictions, output protection). It provides attribution and revocation. |
| 36 | - Authenticate the truth of content. Like C2PA, OVERSIGHT proves who signed what for whom; it does not verify the claims in the content itself. |
| 37 | |
| 38 | ## 3. Threat model |
| 39 | |
| 40 | ### 3.1 Assumptions |
| 41 | |
| 42 | - The issuer controls its signing keys and operates a registry (or delegates to a federated operator). |
| 43 | - The intended recipient controls its decryption keys. |
| 44 | - The network between recipient and registry is untrusted but standard TLS is available. |
| 45 | |
| 46 | ### 3.2 Adversaries |
| 47 | |
| 48 | The protocol defends against: |
| 49 | |
| 50 | | Adversary | Capability | Defense | |
| 51 | |-----------|------------|---------| |
| 52 | | Passive interceptor | Captures sealed file in transit | AEAD, recipient-bound DEK | |
| 53 | | Curious insider | Receives file, shares with third party | Per-recipient watermarking → attribution | |
| 54 | | Thief with wrong key | Steals sealed file, has no decryption key | ECDH/KEM unwrap fails | |
| 55 | | Tamperer | Modifies ciphertext or manifest | AEAD tag + manifest signature + content-hash verify | |
| 56 | | Format-conversion attacker | Decrypts, converts to PDF/screenshot, posts plaintext | Multi-layer watermarking; attribution via registry match | |
| 57 | | Metadata-stripping attacker | Re-serializes file to remove marks | Defeats L2+; L1 zero-width and L3 semantic marks survive | |
| 58 | | Nation-state with quantum computer (future) | Decrypts classical ciphertexts | Hybrid mode: ML-KEM + X25519 | |
| 59 | |
| 60 | The protocol does NOT defend against: |
| 61 | |
| 62 | - The fully-airgapped attacker who also OCR/retypes the document and distributes only the retyped copy. (Semantic/synonym watermarks are the only defense; they are probabilistic.) |
| 63 | - An attacker who compromises the issuer's signing key. (Key rotation and revocation logs are the mitigation.) |
| 64 | - An attacker who owns the registry infrastructure. (Use a federated/transparency-log registry; mitigate with jurisdictional profiles.) |
| 65 | |
| 66 | ## 4. Cryptographic primitives |
| 67 | |
| 68 | ### 4.1 Algorithm suites |
| 69 | |
| 70 | Every sealed file declares an `suite` in its manifest. Implementations MUST reject unknown suites. |
| 71 | |
| 72 | #### 4.1.1 `OSGT-CLASSIC-v1` (suite_id = 1) |
| 73 | |
| 74 | - Key agreement: X25519 (RFC 7748) |
| 75 | - KDF: HKDF-SHA256 (RFC 5869), info = `"oversight-v1-dek-wrap"` |
| 76 | - AEAD: XChaCha20-Poly1305 (draft-irtf-cfrg-xchacha) |
| 77 | - Signature: Ed25519 (RFC 8032) |
| 78 | - Hash: SHA-256 |
| 79 | |
| 80 | #### 4.1.2 `OSGT-HYBRID-v1` (suite_id = 2) |
| 81 | |
| 82 | All primitives of CLASSIC-v1, plus: |
| 83 | |
| 84 | - KEM: ML-KEM-768 (FIPS 203), combined with X25519 using hybrid KDF |
| 85 | - Signature: ML-DSA-65 (FIPS 204), combined with Ed25519 (dual signatures) |
| 86 | |
| 87 | Hybrid key establishment combines the two shared secrets: |
| 88 | |
| 89 | ``` |
| 90 | hybrid_ss = HKDF-SHA256( |
| 91 | salt = "oversight-hybrid-v1", |
| 92 | ikm = x25519_ss || mlkem_ss, |
| 93 | info = "oversight-hybrid-dek-wrap", |
| 94 | len = 32 |
| 95 | ) |
| 96 | ``` |
| 97 | |
| 98 | Hybrid signatures attach both signatures to the manifest. Verification requires BOTH to validate. |
| 99 | |
| 100 | #### 4.1.3 `OSGT-HW-P256-v1` (suite_id = 3) |
| 101 | |
| 102 | For recipients whose private key lives in a PIV-compatible hardware token |
| 103 | (YubiKey, Nitrokey, OnlyKey). The token performs the ECDH on-device; the |
| 104 | private scalar never leaves the device. |
| 105 | |
| 106 | - Key agreement: ECDH on NIST P-256 (FIPS 186-5 / SEC1) |
| 107 | - Recipient public key: SEC1 uncompressed encoding (65 bytes, |
| 108 | `0x04 || X || Y`); recorded in the manifest as `recipient.p256_pub` (hex) |
| 109 | - KDF: HKDF-SHA256 (RFC 5869), `salt = None`, `info = "oversight-hw-p256-v1-dek-wrap"` |
| 110 | - AEAD: XChaCha20-Poly1305, `aad = "oversight-hw-p256-dek"` |
| 111 | - Signature: Ed25519 (issuer); the recipient's hardware suite does not |
| 112 | affect the issuer signature path |
| 113 | - Hash: SHA-256 |
| 114 | |
| 115 | The `wrapped_dek` JSON for this suite is: |
| 116 | |
| 117 | ```json |
| 118 | { |
| 119 | "suite": "OSGT-HW-P256-v1", |
| 120 | "ephemeral_pub": "<hex of SEC1 uncompressed P-256 ephemeral pubkey, 65 bytes>", |
| 121 | "nonce": "<hex, 24 bytes>", |
| 122 | "wrapped_dek": "<hex, AEAD ciphertext including 16-byte tag>" |
| 123 | } |
| 124 | ``` |
| 125 | |
| 126 | The sender holds no hardware key. The ephemeral keypair is generated locally |
| 127 | in software; only the recipient's public key needs to come off the token |
| 128 | (typically via PKCS#11 `C_GetAttributeValue` once at recipient enrollment). |
| 129 | |
| 130 | P-256 was chosen over X25519 for compatibility with the broadest set of PIV |
| 131 | deployments. PIV slots historically support only P-256 and P-384; YubiKey |
| 132 | 5.7+ adds Curve25519 over the OpenPGP applet but PIV itself does not. |
| 133 | Cryptographic strength is unchanged; both X25519 and P-256 ECDH offer |
| 134 | ~128-bit security. |
| 135 | |
| 136 | ### 4.2 Custom cryptography is PROHIBITED |
| 137 | |
| 138 | Implementations MUST NOT introduce new cryptographic primitives. The suite identifiers are reserved; new suites may only be added via specification update after independent review. |
| 139 | |
| 140 | ## 5. Container format |
| 141 | |
| 142 | ### 5.1 Wire layout |
| 143 | |
| 144 | All integers are unsigned big-endian. |
| 145 | |
| 146 | ``` |
| 147 | offset length field notes |
| 148 | ------ -------- ----------------- --------------------------------- |
| 149 | 0 6 magic 0x53 0x4E 0x54 0x4C 0x01 0x00 ("OSGT\x01\x00") |
| 150 | 6 1 format_version MUST be 0x01 |
| 151 | 7 1 suite_id 1 = CLASSIC_v1, 2 = HYBRID_v1, 3 = HW_P256_v1 |
| 152 | 8 4 manifest_len length of manifest JSON in bytes |
| 153 | 12 M manifest canonical JSON (signed) |
| 154 | 12+M 4 wrapped_dek_len |
| 155 | ... W wrapped_dek JSON; per-suite shape (see 5.2) |
| 156 | ... 24 aead_nonce XChaCha20-Poly1305 nonce |
| 157 | ... 4 ciphertext_len |
| 158 | ... C ciphertext AEAD output, includes 16-byte tag |
| 159 | ``` |
| 160 | |
| 161 | Implementations MUST reject any `.sealed` file whose unsigned `suite_id` |
| 162 | header does not match the signed `manifest.suite` value, and MUST reject |
| 163 | trailing bytes after the declared ciphertext region. |
| 164 | |
| 165 | ### 5.2 `wrapped_dek` JSON shape per suite |
| 166 | |
| 167 | The `wrapped_dek` byte range holds a canonical-JSON object whose fields |
| 168 | depend on the manifest's declared `suite`. All byte values are lowercase |
| 169 | hex unless otherwise noted. |
| 170 | |
| 171 | #### `OSGT-CLASSIC-v1` |
| 172 | |
| 173 | ```json |
| 174 | { |
| 175 | "ephemeral_pub": "<32-byte X25519 ephemeral public key>", |
| 176 | "nonce": "<24-byte XChaCha20-Poly1305 nonce>", |
| 177 | "wrapped_dek": "<DEK ciphertext + 16-byte tag>" |
| 178 | } |
| 179 | ``` |
| 180 | |
| 181 | KDF: `HKDF-SHA256(salt=None, ikm=ss_x, info="oversight-v1-dek-wrap", L=32)`. |
| 182 | AAD on `wrapped_dek`: `"oversight-dek"`. |
| 183 | |
| 184 | #### `OSGT-HYBRID-v1` |
| 185 | |
| 186 | ```json |
| 187 | { |
| 188 | "suite": "OSGT-HYBRID-v1", |
| 189 | "x25519_ephemeral_pub": "<32-byte X25519 ephemeral public key>", |
| 190 | "mlkem_ciphertext": "<1088-byte ML-KEM-768 ciphertext>", |
| 191 | "nonce": "<24-byte XChaCha20-Poly1305 nonce>", |
| 192 | "wrapped_dek": "<DEK ciphertext + 16-byte tag>" |
| 193 | } |
| 194 | ``` |
| 195 | |
| 196 | KDF: `HKDF-SHA256(salt=None, ikm=ss_x || ss_pq || x25519_eph_pub || mlkem_ct, |
| 197 | info="oversight-hybrid-v1-dek-wrap", L=32)`. AAD on `wrapped_dek`: |
| 198 | `"oversight-hybrid-dek"`. The X-wing-style binding over both shared |
| 199 | secrets and both ephemeral inputs prevents any future construction in |
| 200 | which an attacker could substitute a valid-but-different ciphertext. |
| 201 | |
| 202 | #### `OSGT-HW-P256-v1` |
| 203 | |
| 204 | ```json |
| 205 | { |
| 206 | "suite": "OSGT-HW-P256-v1", |
| 207 | "ephemeral_pub": "<65-byte SEC1 uncompressed P-256 ephemeral public key>", |
| 208 | "nonce": "<24-byte XChaCha20-Poly1305 nonce>", |
| 209 | "wrapped_dek": "<DEK ciphertext + 16-byte tag>" |
| 210 | } |
| 211 | ``` |
| 212 | |
| 213 | KDF: `HKDF-SHA256(salt=None, ikm=ss_p256, info="oversight-hw-p256-v1-dek-wrap", |
| 214 | L=32)`. AAD on `wrapped_dek`: `"oversight-hw-p256-dek"`. |
| 215 | |
| 216 | A polymorphic open implementation MUST dispatch on the unsigned |
| 217 | `suite_id` header (after the manifest-suite consistency check), parse |
| 218 | the corresponding shape, and reject any envelope whose ephemeral public |
| 219 | key length does not match the suite's curve. Mixing keys across suites |
| 220 | is a misuse and MUST be rejected rather than silently produce a derived |
| 221 | shared secret. |
| 222 | |
| 223 | ### 5.2 Manifest |
| 224 | |
| 225 | The manifest is canonical JSON per RFC 8785 (JCS: keys sorted by UTF-16 code unit, no whitespace, non-ASCII emitted as raw UTF-8). Required fields: |
| 226 | |
| 227 | - `file_id` (UUID v4) |
| 228 | - `issued_at` (unix seconds, UTC) |
| 229 | - `version` (`"OVERSIGHT-v1"`) |
| 230 | - `suite` (suite identifier string) |
| 231 | - `content_hash` (hex SHA-256 of plaintext) |
| 232 | - `canonical_content_hash` (hex SHA-256 of the source bytes before |
| 233 | L3/L2/L1 watermarking; used to resolve wording disputes) |
| 234 | - `size_bytes` (plaintext length) |
| 235 | - `issuer_id` (string) |
| 236 | - `issuer_ed25519_pub` (hex) |
| 237 | - `recipient` (object: `recipient_id`, `x25519_pub`, optional `ed25519_pub`) |
| 238 | - `signature_ed25519` (hex, Ed25519 over canonical bytes without signature fields) |
| 239 | |
| 240 | Optional fields: |
| 241 | |
| 242 | - `original_filename`, `content_type` |
| 243 | - `watermarks` (array of `{layer, mark_id}`) |
| 244 | - `beacons` (array of beacon descriptors) |
| 245 | - `policy` (`not_after`, `max_opens`, `jurisdiction`, `registry_url`, `require_attestation`) |
| 246 | - `l3_policy` (object describing L3 mode, document class, disclosure state, |
| 247 | and safety rationale) |
| 248 | - `signature_ml_dsa` (hex, for HYBRID suites) |
| 249 | |
| 250 | ### 5.3 DEK wrapping |
| 251 | |
| 252 | A fresh 32-byte DEK is generated per file. The wrapping procedure for CLASSIC-v1: |
| 253 | |
| 254 | 1. Generate ephemeral X25519 keypair `(eph_sk, eph_pk)`. |
| 255 | 2. Compute `ss = X25519(eph_sk, recipient_x25519_pub)`. |
| 256 | 3. Derive `kek = HKDF-SHA256(ss, salt=nil, info="oversight-v1-dek-wrap", len=32)`. |
| 257 | 4. Encrypt DEK: `(nonce, ct) = XChaCha20-Poly1305(kek, DEK, aad="oversight-dek")`. |
| 258 | 5. Store `{eph_pk, nonce, ct}` as `wrapped_dek`. |
| 259 | |
| 260 | ### 5.4 AEAD binding |
| 261 | |
| 262 | The ciphertext AEAD takes `AAD = content_hash` (the hex string from the manifest). This binds the ciphertext to the signed manifest; an attacker cannot swap ciphertexts between manifests without breaking the AEAD tag. |
| 263 | |
| 264 | ### 5.5 Post-decrypt verification |
| 265 | |
| 266 | After decryption, the implementation MUST verify that `SHA-256(plaintext) == manifest.content_hash`. If not, discard the plaintext. |
| 267 | |
| 268 | ## 6. Watermarking |
| 269 | |
| 270 | Watermarking is optional but RECOMMENDED. Each applied layer registers a |
| 271 | `mark_id` in the manifest. L3 semantic watermarking changes visible prose and |
| 272 | is therefore opt-in for wording-sensitive classes. Implementations MUST |
| 273 | default L3 off for legal documents, regulatory filings, technical |
| 274 | specifications, source code, SQL, logs, and structured data unless the user |
| 275 | explicitly enables and acknowledges the textual change. |
| 276 | |
| 277 | ### 6.1 Layer identifiers |
| 278 | |
| 279 | - `L1_zero_width` - zero-width unicode characters scattered through text payloads |
| 280 | - `L2_whitespace` - trailing space vs tab at line endings |
| 281 | - `L3_synonyms` - legacy synonym-class rotation identifier |
| 282 | - `L3_semantic_full` - guarded semantic marks over eligible prose regions |
| 283 | - `L3_semantic_boilerplate` - guarded semantic marks limited to header/footer/cover-page regions |
| 284 | - `L4_dct_visual` - reserved; for image payloads |
| 285 | - `L5_layout` - reserved; for PDF/document layout perturbation |
| 286 | |
| 287 | ### 6.2 Mark IDs |
| 288 | |
| 289 | Mark IDs are 64-bit random values. Collision probability at 2^32 issued marks is ~2^-32. |
| 290 | |
| 291 | ### 6.3 Recovery |
| 292 | |
| 293 | A leaked plaintext is scanned by all supported layer extractors. Each recovered `mark_id` is queried against the registry. A match returns `(file_id, recipient_id, issuer_id)`. |
| 294 | |
| 295 | Implementations SHOULD use multiple layers so that defeating one does not defeat attribution. |
| 296 | |
| 297 | ## 7. Beacons |
| 298 | |
| 299 | ### 7.1 Types |
| 300 | |
| 301 | | Kind | Channel | Triggered by | |
| 302 | |------------|---------|-------------------------------------------------------| |
| 303 | | `dns` | DNS | Document rendering, network-aware readers, preview pipelines | |
| 304 | | `http_img` | HTTPS | `<img>` tags in HTML/Office/PDF/SVG | |
| 305 | | `ocsp` | HTTPS | Certificate revocation checks | |
| 306 | | `license` | HTTPS | Explicit license-server check (policy-enforced) | |
| 307 | |
| 308 | ### 7.2 Token format |
| 309 | |
| 310 | Each beacon carries a 128-bit unguessable `token_id`. The registry maps `token_id → (file_id, recipient_id, issuer_id)`. |
| 311 | |
| 312 | ### 7.3 Passive-only requirement |
| 313 | |
| 314 | Beacons MUST NOT cause code execution on the reader. A beacon is a network callback that a standard renderer makes naturally; it does not require a plugin, macro, or active payload. |
| 315 | |
| 316 | ## 8. Registry |
| 317 | |
| 318 | ### 8.1 Endpoints |
| 319 | |
| 320 | A compliant registry exposes: |
| 321 | |
| 322 | | Method | Path | Purpose | |
| 323 | |--------|----------------------------|-----------------------------------------| |
| 324 | | POST | `/register` | Issuer registers a file's beacons+marks | |
| 325 | | GET | `/p/{token_id}.png` | HTTP image beacon receiver | |
| 326 | | GET | `/r/{token_id}` | OCSP-style beacon receiver | |
| 327 | | GET | `/v/{token_id}` | License-check beacon receiver | |
| 328 | | POST | `/attribute` | Query by token_id or mark_id | |
| 329 | | GET | `/evidence/{file_id}` | Assemble evidence bundle | |
| 330 | |
| 331 | ### 8.2 Qualified timestamps |
| 332 | |
| 333 | Production registries MUST timestamp events via RFC 3161 against at least one qualified Time Stamping Authority (TSA). Evidence bundles MUST include the TimeStampToken(s). |
| 334 | |
| 335 | ### 8.3 Transparency log |
| 336 | |
| 337 | Production registries SHOULD chain events into an append-only transparency log (Sigstore-style Merkle log) so that registry operators cannot fabricate or suppress events undetected. |
| 338 | |
| 339 | ### 8.4 Jurisdictional profiles |
| 340 | |
| 341 | Registries MUST publish a jurisdictional profile declaring: |
| 342 | |
| 343 | - Data residency (where event logs are stored) |
| 344 | - Permitted field collection per event (IP, UA, geolocation, etc.) |
| 345 | - Retention period |
| 346 | - Cross-border data-sharing policy |
| 347 | |
| 348 | The manifest `policy.jurisdiction` MUST match the registry's profile or the seal MUST be rejected. |
| 349 | |
| 350 | ## 9. Evidence bundles |
| 351 | |
| 352 | An evidence bundle is a JSON artifact containing: |
| 353 | |
| 354 | 1. The original signed manifest |
| 355 | 2. All registered beacons and watermarks |
| 356 | 3. Chronologically ordered event log |
| 357 | 4. Qualified timestamps for each event |
| 358 | 5. Registry's own signature over the bundle |
| 359 | 6. Transparency-log inclusion proofs |
| 360 | |
| 361 | The bundle is the foundation for a forensic report per ISO/IEC 27037. A court-admissible final report requires additional human-in-the-loop procedures: examiner qualifications, methodology documentation, and proper preservation of the original blob. |
| 362 | |
| 363 | ## 10. Security considerations |
| 364 | |
| 365 | ### 10.1 Key compromise |
| 366 | |
| 367 | - Issuer key compromise allows forged manifests for the compromise window. Mitigation: short-lived issuer keys, certificate transparency, a revocation list. |
| 368 | - Recipient key compromise allows decryption of all files ever sealed for that recipient. Mitigation: per-purpose recipient keys, forward-secret variants (future work). |
| 369 | |
| 370 | ### 10.2 Replay |
| 371 | |
| 372 | Ciphertext is bound to manifest via AEAD AAD. Manifest is signed and uniquely identified by `file_id`. Replay of a full sealed blob is equivalent to possession of the blob. |
| 373 | |
| 374 | ### 10.3 Side channels |
| 375 | |
| 376 | Implementations MUST use constant-time implementations for all cryptographic primitives. Watermark-embedding timing may leak whether a recipient is being marked; embed times SHOULD be bounded. |
| 377 | |
| 378 | ### 10.4 Metadata exposure |
| 379 | |
| 380 | The manifest is not encrypted. An attacker who captures a sealed blob learns the recipient, issuer, beacons, and watermark IDs. This is intentional: third parties (legal discovery, compliance auditors) must be able to inspect the metadata without holding the decryption key. Sensitive fields SHOULD be hashed or omitted from the manifest if their disclosure is unacceptable. |
| 381 | |
| 382 | ### 10.5 Traffic analysis of beacons |
| 383 | |
| 384 | Beacon callbacks reveal that a sealed file was opened. In hostile environments an attacker who blocks outbound traffic will suppress beacon callbacks. The protocol does not claim to defeat such an attacker; watermarking provides the post-escape attribution path. |
| 385 | |
| 386 | ## 11. IANA considerations |
| 387 | |
| 388 | Reserved media type: `application/vnd.oversight.sealed` |
| 389 | Reserved file extension: `.sealed` |
| 390 | Reserved URN namespace: `urn:oversight:file:<file_id>` |
| 391 | |
| 392 | ## 12. References |
| 393 | |
| 394 | - FIPS 203: Module-Lattice-Based Key-Encapsulation Mechanism |
| 395 | - FIPS 204: Module-Lattice-Based Digital Signature Standard |
| 396 | - RFC 7748: Elliptic Curves for Security (X25519) |
| 397 | - RFC 8032: Edwards-Curve Digital Signature Algorithm (EdDSA) |
| 398 | - RFC 5869: HKDF |
| 399 | - RFC 3161: Time-Stamp Protocol (TSP) |
| 400 | - ISO/IEC 27037: Guidelines for identification, collection, acquisition and preservation of digital evidence |
| 401 | - C2PA 2.3: Content Credentials specification |
| 402 | - draft-irtf-cfrg-xchacha: XChaCha20-Poly1305 |
| 403 | |
| 404 | ## 13. Appendix A - Test vectors (normative) |
| 405 | |
| 406 | Cross-language conformance scripts live at `oversight-rust/tests/conformance_*.sh` |
| 407 | and assert byte-identical seal/open and Rekor DSSE/PAE between the Python |
| 408 | reference and the Rust port. Implementations SHOULD run them on every change |
| 409 | and SHOULD add published byte-exact vectors for every suite they ship. |