| 1 | # Oversight Protocol |
| 2 | |
| 3 | **Open protocol for cryptographic data provenance, recipient attribution, and leak detection.** |
| 4 | |
| 5 | Co-authored by Zion Boggan, Claude Opus 4.6/4.7 (Anthropic), and Codex (GPT-5.4, OpenAI). |
| 6 | |
| 7 | Format-agnostic. Post-quantum ready (ML-KEM-768 + ML-DSA-65). Layered watermarking with honest limits: L1/L2 are lightweight steganographic signals, L3 is opt-in semantic marking for prose, and content fingerprinting helps identify leaked copies even when fragile marks are destroyed. |
| 8 | |
| 9 | No cloud vendor lock-in. No paid service required. No custom cryptography. Apache 2.0. |
| 10 | |
| 11 | **Website:** [https://oversightprotocol.dev/](https://oversightprotocol.dev/) |
| 12 | **Mobile companion (verifier):** [oversight-protocol/oversight-mobile](https://github.com/oversight-protocol/oversight-mobile) - Flutter UI on top of the same Rust crates that power this CLI, currently in internal TestFlight beta. |
| 13 | **Outlook add-in pilot:** [oversightprotocol.dev/integrations/outlook/](https://oversightprotocol.dev/integrations/outlook/), read-mode task pane that verifies and decrypts sealed attachments with the same browser inspector modules. |
| 14 | |
| 15 | --- |
| 16 | |
| 17 | ## Install |
| 18 | |
| 19 | Requires Python 3.10+. |
| 20 | |
| 21 | > **Not comfortable at the command line?** Oversight ships a small desktop app. After the `pip install` step below, run `oversight gui` (or `oversight-gui`) and follow the [GUI quick start](#gui-quick-start). All seal, open, and key-generation workflows are available through the GUI without ever opening a terminal again. |
| 22 | |
| 23 | ```bash |
| 24 | # Clone the repo |
| 25 | git clone https://github.com/oversight-protocol/oversight.git |
| 26 | cd oversight |
| 27 | |
| 28 | # Install (adds the `oversight` command to your PATH) |
| 29 | pip install . |
| 30 | |
| 31 | # Verify |
| 32 | oversight status |
| 33 | ``` |
| 34 | |
| 35 | That's it. The `oversight` command is now available globally. The `oversight-gui` desktop app entry point is installed at the same time. |
| 36 | |
| 37 | On Debian, Ubuntu, and derivatives, Tkinter is packaged separately. Install it once so the GUI can launch: `sudo apt install python3-tk`. On Windows and macOS, Tkinter ships with the standard CPython installer. |
| 38 | |
| 39 | ### Optional extras |
| 40 | |
| 41 | ```bash |
| 42 | # Include registry server (FastAPI) |
| 43 | pip install ".[registry]" |
| 44 | |
| 45 | # Include format adapters (PDF, DOCX, image watermarking) |
| 46 | pip install ".[formats]" |
| 47 | |
| 48 | # Everything |
| 49 | pip install ".[all]" |
| 50 | ``` |
| 51 | |
| 52 | ## Live Registry Deployment |
| 53 | |
| 54 | The reference registry ships with a public-safe Compose/Caddy deployment path. |
| 55 | Start `oversight-registry` on loopback for local operation, then enable the |
| 56 | `live` profile when DNS is ready and Caddy should terminate TLS for the |
| 57 | registry, beacon, OCSP-style, and license-style hostnames. |
| 58 | |
| 59 | ```bash |
| 60 | cp .env.example .env |
| 61 | docker compose up -d oversight-registry |
| 62 | docker compose --profile live up -d |
| 63 | ``` |
| 64 | |
| 65 | Set `OVERSIGHT_DNS_EVENT_SECRET` and `OVERSIGHT_OPERATOR_TOKEN` in `.env` |
| 66 | before exposing a public host. The operator token protects `POST /register` |
| 67 | and `POST /attribute`; the DNS secret authenticates `/dns_event` bridge |
| 68 | callbacks. The Python FastAPI registry and Rust Axum registry both honor |
| 69 | the same bearer/header token contract. Full route map and validation commands are in |
| 70 | [`docs/REGISTRY_DEPLOYMENT.md`](docs/REGISTRY_DEPLOYMENT.md). |
| 71 | |
| 72 | Operators moving from the Python registry to the Rust Axum registry can run a |
| 73 | dry-run copy first: |
| 74 | |
| 75 | ```bash |
| 76 | oversight-registry --db rust-registry.sqlite \ |
| 77 | --migrate-from python-registry.sqlite \ |
| 78 | --migrate-dry-run |
| 79 | ``` |
| 80 | |
| 81 | Remove `--migrate-dry-run` to copy manifests, beacons, watermarks, events, and |
| 82 | corpus rows into the Rust database, then run: |
| 83 | |
| 84 | ```bash |
| 85 | oversight-registry --db rust-registry.sqlite --validate-db |
| 86 | ``` |
| 87 | |
| 88 | The validator reports orphan rows, manifest/signature failures, and identity |
| 89 | mismatches before an operator treats the migrated Rust database as live. |
| 90 | |
| 91 | ## Current main after v0.4.11 |
| 92 | |
| 93 | `main` is past the v0.4.11 hardware-key tag. The live registry deployment |
| 94 | path now includes the Compose/Caddy `live` profile, `.env.example` operator |
| 95 | secrets, and shared write-side token enforcement across the Python FastAPI |
| 96 | and Rust Axum registries. The Rust registry also has Python-to-Rust SQLite |
| 97 | migration tooling (`--migrate-from`, `--migrate-dry-run`) and a native |
| 98 | `--validate-db` integrity report so operators can preflight, copy, and verify |
| 99 | attribution rows, event metadata, corpus metadata, and tlog indexes without |
| 100 | treating the Python reference as a permanent production dependency. |
| 101 | Rust registry writes now fail closed if the local transparency log cannot |
| 102 | append, so new evidence rows cannot silently lose their audit trail. |
| 103 | The validator also checks that event rows point at matching tlog leaf payloads, |
| 104 | not just in-range indexes. |
| 105 | The local transparency log now fails closed when recovered leaf records are |
| 106 | malformed, out of sequence, or hash-mismatched. |
| 107 | The Rust registry's `/tlog/range` endpoint uses those validated leaf records |
| 108 | too, so federated monitors cannot receive a partial range with corrupted lines |
| 109 | silently skipped. |
| 110 | The Python reference registry now has matching tlog recovery and range |
| 111 | validation, including exact `leaf_data_hex` persistence for newly appended |
| 112 | leaves. |
| 113 | `docs/REGISTRY_V1_STABILITY.md` now records the registry v1.0 candidate |
| 114 | wire-format freeze: the endpoint set, error envelope, canonicalization rules, |
| 115 | sidecar authority rule, and conformance gate are the operator burn-in target. |
| 116 | |
| 117 | The next Rust-registry gate is operational burn-in: longer-running deployment |
| 118 | tests against real operator databases and a final v1.0 release tag against the |
| 119 | candidate profile. |
| 120 | |
| 121 | ## Quick start |
| 122 | |
| 123 | ```bash |
| 124 | # 1. Initialize a project directory |
| 125 | mkdir my-project && cd my-project |
| 126 | oversight init |
| 127 | |
| 128 | # 2. Generate your issuer identity |
| 129 | oversight keys generate --name zion |
| 130 | |
| 131 | # 3. Generate a recipient identity (they would do this on their machine) |
| 132 | oversight keys generate --name alice --out alice.json |
| 133 | |
| 134 | # 4. Import the recipient's public key |
| 135 | oversight keys import alice.pub.json |
| 136 | |
| 137 | # 5. Seal a document to the recipient (watermarks embedded by default) |
| 138 | oversight seal report.txt --to alice |
| 139 | |
| 140 | # 6. The recipient opens the sealed file |
| 141 | oversight open report.txt.sealed --out report-decrypted.txt |
| 142 | |
| 143 | # 7. If the document leaks, attribute it |
| 144 | oversight attribute --leak leaked.txt --fingerprints .oversight/fingerprints |
| 145 | ``` |
| 146 | |
| 147 | ### GUI quick start |
| 148 | |
| 149 | If you would rather click than type, run `oversight gui` (or the equivalent `oversight-gui` entry point). A single desktop window opens with three tabs that cover the same workflows as the CLI steps above. |
| 150 | |
| 151 | 1. **Generate Keys.** Pick an identity name and an output path, then press **Generate Keypair**. The GUI writes your private identity JSON (with `0600` permissions on POSIX or a best-effort Windows ACL narrowing) and a sibling `.pub.json` that you hand to any issuer who needs to seal something to you. |
| 152 | 2. **Seal File.** Choose the input plaintext, the issuer's private key, and the recipient's `.pub.json`. Pick an L3 mode (`auto` is safest), leave the L1 and L2 watermark checkbox on, and press **Seal**. The GUI writes a `.sealed` container and a `.fingerprint.json` sidecar next to it. If L3 is about to rewrite body text, the GUI asks for explicit acknowledgement first, matching the CLI's `--l3-ack` gate. |
| 153 | 3. **Open File.** Choose a `.sealed` file, your private identity, and an output path. The GUI verifies the manifest signature, enforces policy, decrypts the content, and writes the plaintext to the chosen location. |
| 154 | |
| 155 | The full walk-through, including every field, the L3 disclosure flow, and troubleshooting for common issues, is at [oversightprotocol.dev/docs/gui.html](https://oversightprotocol.dev/docs/gui.html). |
| 156 | |
| 157 | ### What happens when you seal |
| 158 | |
| 159 | The seal command applies watermark layers to the document, each targeting a different attack surface: |
| 160 | |
| 161 | - **L1** inserts zero-width Unicode characters (survives copy-paste) |
| 162 | - **L2** encodes bits in trailing whitespace patterns (survives most editors) |
| 163 | - **L3** optionally rotates prose choices from a 151-class dictionary (survives format conversion and screenshot/OCR, but changes visible text and can be defeated by motivated collusion/canonicalization) |
| 164 | |
| 165 | L3 defaults off for legal documents, regulatory filings, technical specifications, source code, SQL, logs, and structured data. When L3 is enabled, Oversight asks for explicit acknowledgement and records `canonical_content_hash` in the signed manifest so disputes can compare the recipient copy against the canonical source. |
| 166 | |
| 167 | Then it encrypts to the recipient's X25519 public key, timestamps via RFC 3161, logs to the Merkle tree, and writes the `.sealed` file plus a `.fingerprint.json` sidecar for the content fingerprint database. |
| 168 | |
| 169 | Oversight currently emits one sealed file per recipient. Multi-recipient |
| 170 | sealing is intentionally disabled until the manifest format can bind |
| 171 | multiple recipients without weakening attribution evidence. |
| 172 | |
| 173 | ### What happens when you attribute |
| 174 | |
| 175 | The attribute command runs a 5-phase pipeline: |
| 176 | |
| 177 | 1. **Direct extraction** of L1/L2 marks from the leaked text |
| 178 | 2. **Registry query** for candidate mark IDs |
| 179 | 3. **L3 semantic verification** against candidates (synonym score + punctuation + spelling + contractions) |
| 180 | 4. **Multi-layer Bayesian fusion** combining all evidence into ranked candidates |
| 181 | 5. **Content fingerprint comparison** (winnowing + sentence hashing) as a last resort when all watermarks are stripped |
| 182 | |
| 183 | ## What's new in v0.4.11 |
| 184 | |
| 185 | **Hardware-keys completion across every reference implementation.** v0.4.11 |
| 186 | finishes what v0.4.10 started. The `OSGT-HW-P256-v1` suite now ships |
| 187 | end-to-end in `oversight_core.crypto` (Python: `wrap_dek_for_recipient_p256`, |
| 188 | `unwrap_dek_p256` accepting `EllipticCurvePrivateKey`, PKCS#8 bytes, or raw |
| 189 | integer scalars), in `oversight-container` (`seal_hw_p256` + |
| 190 | `open_sealed_with_provider` polymorphic dispatch on `suite_id`), in the |
| 191 | manifest schema (`Recipient.p256_pub` optional field, deserialization |
| 192 | back-compatible), and in the public browser inspector at |
| 193 | <https://oversightprotocol.dev/viewer/> via vendored `@noble/curves` P-256 |
| 194 | ECDH. Every existing classic and hybrid call site is unchanged. The |
| 195 | container's existing rule that the unsigned `suite_id` header must match |
| 196 | the signed `manifest.suite` covers cross-suite-mixing attacks for free. |
| 197 | A new `tools/gen_hw_p256_sample.py` produces the public viewer's |
| 198 | `tutorial-hw-p256.sealed` fixture without needing `oqs` or hardware. The |
| 199 | last piece of the hardware story, `PivKeyProvider` against PKCS#11, is |
| 200 | the next bounded follow-up. |
| 201 | |
| 202 | ## What's new in v0.4.10 |
| 203 | |
| 204 | **Hardware-keys foundation.** `oversight-crypto` now exposes a |
| 205 | `KeyProvider` trait that abstracts the recipient-side ECDH so a |
| 206 | hardware-backed token (YubiKey / Nitrokey / OnlyKey via PIV) can plug |
| 207 | into the open path without changing call sites. `FileKeyProvider` |
| 208 | ships as the X25519 default. The hardware-track suite |
| 209 | `OSGT-HW-P256-v1` is fully implemented in software: |
| 210 | `wrap_dek_for_recipient_p256` + `WrappedDekP256` + |
| 211 | `SoftwareP256KeyProvider` (NIST P-256 ECDH, RustCrypto's `p256` |
| 212 | crate). `oversight-container` recognizes the new suite id (`3`) so |
| 213 | sealed files for hardware recipients ride the existing 1-byte header |
| 214 | dispatch without a layout change. The `PivKeyProvider` (PKCS#11) |
| 215 | implementation is the next bounded follow-up; the trait and software |
| 216 | reference let it ship without touching seal-side or container code. |
| 217 | Full crate test count is 21/21 in `oversight-crypto` and 12/12 in |
| 218 | `oversight-container`. Public API additive; v0.4.9 callers unchanged. |
| 219 | |
| 220 | ## What's new in v0.4.9 |
| 221 | |
| 222 | **Browser inspector decrypts hybrid (post-quantum) sealed files.** |
| 223 | The viewer at <https://oversightprotocol.dev/viewer/> now decrypts |
| 224 | both `OSGT-CLASSIC-v1` and `OSGT-HYBRID-v1` files end-to-end. The |
| 225 | hybrid path uses WebCrypto for X25519 + HKDF-SHA256, a vendored |
| 226 | `@noble/ciphers` for XChaCha20-Poly1305, and a vendored |
| 227 | `@noble/post-quantum` ML-KEM-768 for the post-quantum half of the |
| 228 | KEM. The KEK is bound X-wing-style over both shared secrets and |
| 229 | both ephemeral inputs (`ss_x || ss_pq || eph_pub || mlkem_ct`), |
| 230 | matching `oversight_core.crypto.hybrid_wrap_dek` byte for byte. |
| 231 | All vendored libraries ship with rewritten relative imports so the |
| 232 | inspector remains fully offline-capable. Try it with the new |
| 233 | "Load hybrid tutorial identity" button against `tutorial-hybrid.sealed`. |
| 234 | |
| 235 | **Rust registry v1 conformance.** `oversight-rust/oversight-registry` |
| 236 | now exposes the full read-only and beacon surface |
| 237 | (`/.well-known/oversight-registry`, `/evidence/{file_id}`, |
| 238 | `/tlog/head|proof|range`, `/p/{token_id}.png`, `/r/{token_id}`, |
| 239 | `/v/{token_id}`, `/candidates/semantic`) and ships strict CORS |
| 240 | restricted to the public browser-inspector origins with GET and |
| 241 | OPTIONS only. The Axum server now passes `tests/test_registry_conformance.py` |
| 242 | (38/38) in live-URL mode. `oversight-rust/oversight-manifest` learned |
| 243 | to verify Python-signed v0.4.5+ manifests by carrying |
| 244 | `canonical_content_hash` and `l3_policy` in the signed model, with |
| 245 | a fallback path for older manifests that lack those fields. |
| 246 | |
| 247 | **Format watermark round-trip fixes.** `oversight-rust/oversight-formats` |
| 248 | text embedding now keeps L2 trailing-whitespace marks at physical |
| 249 | line endings after L1 zero-width insertion, image LSB embedding |
| 250 | no longer overwrites earlier payload bits via duplicate pixel |
| 251 | slots, and current `main` adds DCT mid-band spread-spectrum image |
| 252 | watermarking to match the Python reference path. Workspace test suite |
| 253 | is green again. |
| 254 | |
| 255 | ## What's new in v0.4.8 |
| 256 | |
| 257 | **Mobile-build portability and security bump.** Patch release. The |
| 258 | Rust core's 4 GiB ciphertext-size cap is now gated to 64-bit targets |
| 259 | and falls back to `usize::MAX` on 32-bit, which is what unblocks the |
| 260 | mobile companion's `armv7` and `i686` Android builds (the desktop CLI |
| 261 | and registry are unchanged). `rustls-webpki` lifted to 0.103.13 to |
| 262 | pick up the GHSA-82j2-j2ch-gfr8 CRL parse fix and a corrected URI |
| 263 | name-constraint check; both apply to our Rekor TLS path. |
| 264 | |
| 265 | ## What's new in v0.4.7 |
| 266 | |
| 267 | **Registry federation hardening.** `docs/spec/registry-v1.md` now |
| 268 | specifies the canonicalization algorithm, the uniform error envelope |
| 269 | and code vocabulary, the full endpoint list including the normative |
| 270 | beacon paths, the `/.well-known/oversight-registry` shape, and the |
| 271 | `/evidence` bundle fields. The spec matches what the reference |
| 272 | registry actually serves, so an independent implementation can target |
| 273 | something real instead of something aspirational. |
| 274 | |
| 275 | **Conformance harness.** `tests/test_registry_conformance.py` is a |
| 276 | 32-check test that runs either against the reference registry |
| 277 | in-process (CI) or against any live URL |
| 278 | (`OVERSIGHT_REGISTRY_URL=https://registry.example.org python3 |
| 279 | tests/test_registry_conformance.py`). An independent operator who |
| 280 | passes the harness can claim v1 compatibility. |
| 281 | |
| 282 | ## What's new in v0.4.6 |
| 283 | |
| 284 | **SIEM export.** Registry beacon events can now be emitted in three |
| 285 | SIEM-native formats: Splunk HEC envelopes, Elastic Common Schema 8.x |
| 286 | documents, and Microsoft Sentinel Log Analytics custom-log rows. The new |
| 287 | `oversight_core.siem` module ships pure formatters, a normalized |
| 288 | `OversightEvent` model built from the registry `events` table, file and |
| 289 | stdout and HTTP sinks, and a Sentinel HMAC signing helper. |
| 290 | |
| 291 | **`oversight siem export` CLI.** Streams events as JSON lines to stdout, |
| 292 | a file, or a generic HTTPS collector. Supports `--since`, `--limit`, |
| 293 | repeatable `--header`, and Splunk source/sourcetype/index overrides. |
| 294 | Opens the registry database in read-only mode so it is safe to run |
| 295 | against a live service. Full operator guide at `docs/SIEM.md`. |
| 296 | |
| 297 | ## What's new in v0.4.5 |
| 298 | |
| 299 | **L3 safety and usability.** Semantic watermarking is now format-aware and |
| 300 | opt-in for sensitive classes, with full/boilerplate/off modes, disclosure |
| 301 | acknowledgement, canonical source hashing, protected-region skips, and explicit |
| 302 | collusion/threat-model documentation in `docs/security.md`. |
| 303 | |
| 304 | **GUI starter.** `oversight gui` launches a small desktop app for key |
| 305 | generation, sealing, and opening files so non-technical recipients are not |
| 306 | forced through the CLI. The GUI and CLI now guard local writes so seal/open |
| 307 | outputs cannot overwrite selected input files or Oversight private-key JSON; |
| 308 | private-key generation uses atomic replacement and restrictive permissions or |
| 309 | best-effort Windows ACL hardening. |
| 310 | |
| 311 | **Registry federation draft.** `docs/spec/registry-v1.md` documents the |
| 312 | interoperability contract for compatible registry operators. |
| 313 | |
| 314 | ## What's new in v0.4.4 |
| 315 | |
| 316 | **Security hardening over v0.4.3.** This line starts from the v0.4.3 Python |
| 317 | package baseline and adds the 2026-04-20 review fixes from Codex (GPT-5.4). |
| 318 | Use v0.4.4 or current `main` for the hardened behavior described below. |
| 319 | |
| 320 | **Signed evidence continuity.** Registry registration now stores only the |
| 321 | beacons and watermarks that match the issuer-signed manifest, Rekor |
| 322 | attestations index by real watermark IDs and actual content hashes, and the |
| 323 | local transparency-log empty root matches RFC 6962. |
| 324 | |
| 325 | **Recipient-honest policy enforcement.** `max_opens` counts only successful |
| 326 | recipient decryptions, Windows local counters work, registry-backed counter |
| 327 | modes fail closed until implemented, and unsafe multi-recipient sealing is |
| 328 | disabled until the manifest format can represent multiple recipients honestly. |
| 329 | |
| 330 | ## What's new in v0.4.3 |
| 331 | |
| 332 | **Anti-stripping defenses.** ECC-protected synonym bits (R=7 repetition codes), winnowing content fingerprints, sentence-level content hashing, 25 spelling variant pairs, 30 contraction choices, number formatting marks. The VM-strip-export attack (open in airgapped VM, strip invisible chars, export clean file) is now defended by content fingerprinting. |
| 333 | |
| 334 | **Rich interactive CLI.** Colorful terminal interface with progress bars, panels, config management, and streamlined commands. Run `oversight init` to get started. |
| 335 | |
| 336 | **L3 integration.** The 151-class synonym rotation system and punctuation fingerprinting, previously implemented but not wired into the pipeline, are now fully integrated. Multi-layer Bayesian fusion combines L1, L2, and L3 evidence. |
| 337 | |
| 338 | See `CHANGELOG.md` for full version history. |
| 339 | |
| 340 | ## Security hardening |
| 341 | |
| 342 | These items are included in v0.4.4/v0.4.5 and current `main`: |
| 343 | |
| 344 | - `max_opens` now counts only successful recipient decryptions, not failed key guesses. |
| 345 | - `LOCAL_ONLY` open counters now work on Windows as well as POSIX hosts. |
| 346 | - `REGISTRY` and `HYBRID` policy modes fail closed instead of silently falling back to local counters. |
| 347 | - Rekor offline verification now checks the attested digest against the expected content hash. |
| 348 | - Registry Rekor attestations now index by real watermark mark IDs and the manifest's actual `content_hash`. |
| 349 | - Registry registration now refuses unsigned beacon/watermark sidecars that do not match the issuer-signed manifest. |
| 350 | - Multi-recipient sealing is disabled until a recipient-honest manifest format lands. |
| 351 | - Local transparency-log empty-tree roots now match RFC 6962 exactly. |
| 352 | - Rust registry and format-adapter paths now mirror the Python hardening: |
| 353 | authenticated DNS beacon callbacks, no silent signed-artifact drops, |
| 354 | digest-checked Rekor offline verification, fail-closed Rust `max_opens`, |
| 355 | DOCX keyword insertion, PDF action screening, and parsed PDF text |
| 356 | extraction for fingerprinting. |
| 357 | - L3 semantic watermarking is opt-in for sensitive classes, requires |
| 358 | disclosure acknowledgement when enabled, and records `canonical_content_hash`. |
| 359 | - `.sealed` parsing rejects suite-byte tamper, malformed manifest or wrapped-DEK |
| 360 | JSON, unknown manifest fields, and trailing bytes after ciphertext. |
| 361 | - Dependency floors now exclude known vulnerable PyPI and Rust manifest ranges |
| 362 | flagged by Dependabot/advisory checks. |
| 363 | |
| 364 | ## Repository layout |
| 365 | |
| 366 | ``` |
| 367 | oversight/ Python reference (5,689 LOC) |
| 368 | ├── oversight_core/ |
| 369 | │ ├── crypto.py X25519 + Ed25519 + XChaCha20 + HKDF + PQ hybrid |
| 370 | │ ├── container.py .sealed binary format |
| 371 | │ ├── manifest.py signed canonical-JSON manifest |
| 372 | │ ├── watermark.py L1 zero-width, L2 whitespace |
| 373 | │ ├── semantic.py L3 synonyms + punctuation |
| 374 | │ ├── synonyms_v2.py 151-class expanded dictionary |
| 375 | │ ├── policy.py not_after / max_opens / jurisdiction |
| 376 | │ ├── beacon.py DNS / HTTP / OCSP / license beacons |
| 377 | │ ├── tlog.py Merkle transparency log |
| 378 | │ ├── rekor.py Sigstore Rekor v2 (DSSE + PAE) |
| 379 | │ ├── timestamp.py RFC 3161 (FreeTSA + DigiCert) |
| 380 | │ ├── decoy.py Ollama-powered decoy files |
| 381 | │ └── formats/{text,image,pdf,docx}.py |
| 382 | ├── oversight_dns/server.py authoritative NS for beacon domain |
| 383 | ├── registry/server.py FastAPI - tlog, signed bundles, rate limit |
| 384 | ├── integrations/ |
| 385 | │ ├── flywheel_oversight_match.py Flywheel scraper hook |
| 386 | │ └── perseus_canarykeeper.py Perseus Discord alert agent |
| 387 | ├── cli/oversight.py |
| 388 | ├── tests/{test_e2e.py,test_e2e_v2.py,test_pq.py,...} |
| 389 | └── docs/{SPEC.md,ROADMAP.md,EMBEDDING.md,security.md} |
| 390 | |
| 391 | oversight-rust/ Rust port (2,934 LOC) |
| 392 | ├── Cargo.toml workspace |
| 393 | ├── oversight-crypto/ X25519, Ed25519, XChaCha20, HKDF, zeroize |
| 394 | ├── oversight-manifest/ JCS canonical JSON, Ed25519 sign/verify |
| 395 | ├── oversight-container/ .sealed format parser, hard caps |
| 396 | ├── oversight-watermark/ L1 + L2 |
| 397 | ├── oversight-tlog/ RFC 6962 Merkle log, signed heads |
| 398 | ├── oversight-policy/ fs2 flock + atomic rename, TOCTOU-safe |
| 399 | ├── oversight-semantic/ 151-class dict + L3 airgap-survivor |
| 400 | ├── oversight-formats/ text, image (DCT), pdf, docx adapters |
| 401 | ├── oversight-rekor/ Sigstore Rekor v2 (DSSE + PAE) |
| 402 | ├── oversight-registry/ Axum + SQLite registry (parity with FastAPI) |
| 403 | ├── oversight-cli/ keygen / seal / open / inspect |
| 404 | ├── fuzz/ cargo-fuzz harnesses (container, manifest) |
| 405 | └── tests/conformance_*.sh bit-for-bit Python<->Rust conformance |
| 406 | ``` |
| 407 | |
| 408 | ## Quickstart |
| 409 | |
| 410 | ### Python reference (all features) |
| 411 | |
| 412 | ```bash |
| 413 | pip install -r requirements.txt |
| 414 | python tests/test_e2e.py # 11 checks |
| 415 | python tests/test_e2e_v2.py # 13 checks |
| 416 | python tests/test_pq.py # 7 checks (needs liboqs) |
| 417 | ``` |
| 418 | |
| 419 | ### Rust core (crypto, container, manifest, watermark, CLI) |
| 420 | |
| 421 | ```bash |
| 422 | cd oversight-rust |
| 423 | cargo test --workspace # 142 checks |
| 424 | cargo run -- keygen --out alice.json |
| 425 | cargo run -- seal --input doc.txt --output doc.sealed \ |
| 426 | --issuer issuer.json --recipient-pub <hex> --recipient-id alice@test |
| 427 | cargo run -- open --input doc.sealed --output - --recipient alice.json |
| 428 | ``` |
| 429 | |
| 430 | ### Cross-language conformance |
| 431 | |
| 432 | ```bash |
| 433 | bash oversight-rust/tests/conformance_cross_lang.sh |
| 434 | ``` |
| 435 | |
| 436 | ## Embedding the verification core |
| 437 | |
| 438 | Downstream projects can embed the Oversight Rust verification core without |
| 439 | reimplementing it. The companion mobile verifier |
| 440 | ([`oversight-protocol/oversight-mobile`](https://github.com/oversight-protocol/oversight-mobile)) |
| 441 | does exactly this through `flutter_rust_bridge`, so a manifest that opens on |
| 442 | a desktop opens the same way on a phone with the same answer. |
| 443 | |
| 444 | The full integration contract, including the seven verifier-safe crates, |
| 445 | the crates that are explicitly out of scope for downstream embedding, the |
| 446 | git-plus-tag pin pattern, and the minimum versions for 32-bit mobile |
| 447 | support, is documented at [`docs/EMBEDDING.md`](docs/EMBEDDING.md). v0.4.11 |
| 448 | is the recommended pin for any new embedder; v0.4.8 remains the minimum for |
| 449 | 32-bit Android portability, but the project does not backport fixes below the |
| 450 | current stable line. |
| 451 | |
| 452 | ## Test coverage |
| 453 | |
| 454 | | Layer | Checks | Status | |
| 455 | |---|---|---| |
| 456 | | Python pytest suite | 15 | green | |
| 457 | | Rust oversight-container | 17 | green | |
| 458 | | Rust oversight-crypto | 21 | green | |
| 459 | | Rust oversight-formats | 40 | green | |
| 460 | | Rust oversight-manifest | 3 | green | |
| 461 | | Rust oversight-policy | 7 | green | |
| 462 | | Rust oversight-registry | 17 | green | |
| 463 | | Rust oversight-rekor | 10 | green | |
| 464 | | Rust oversight-semantic | 8 | green | |
| 465 | | Rust oversight-tlog | 14 | green | |
| 466 | | Rust oversight-watermark | 4 | green | |
| 467 | | Cross-language conformance | 2 scripts | green | |
| 468 | | Total automated Rust unit tests | 142 | all green | |
| 469 | |
| 470 | ## Design principles (what Oversight never does) |
| 471 | |
| 472 | - **No custom cryptography.** Every primitive is NIST-standardized or equivalent. `x25519-dalek`, `ed25519-dalek`, `chacha20poly1305`, `hkdf`, `sha2`, ML-KEM-768, ML-DSA-65 via liboqs. That's the whole list. |
| 473 | - **No cloud vendor lock-in.** Dropped the original AWS Nitro Enclaves plan. Hardware-key protection uses any PIV / PKCS#11 hardware key (YubiKey, Nitrokey, OnlyKey); see `docs/HARDWARE_KEYS.md`. Transparency log can run on public Sigstore Rekor or self-hosted; your choice. |
| 474 | - **No RATs, no defensive malware.** Every "phone home" mechanism is a passive beacon - the kind of network call a normal document reader makes during rendering (image fetch, OCSP lookup, DNS resolution). We never execute code on a reader's machine. |
| 475 | - **No tracking of personal identifiers.** Mark IDs are random 128-bit tokens. The registry maps them to recipient IDs that the issuer chose - the issuer decides how much identity binding to apply. |
| 476 | - **No paid service required.** Year-1 all-in cost estimate: ~$6,200 (YubiKeys + domain + one conference). See `docs/ROADMAP.md`. |
| 477 | |
| 478 | ## Honest limitations |
| 479 | |
| 480 | - **Human paraphrasing defeats watermarks.** Someone who reads the document and rewrites it in their own words leaves no trace. Fundamental, not an engineering gap. |
| 481 | - **Beacons fire only when the reader has network access.** Airgapped readers leave no callback. L3 semantic watermarking is the attribution path for that case. |
| 482 | - **The local Python Merkle transparency log is still not a full Sigstore-compatible substitute.** |
| 483 | Public-log interoperability is now via Rekor DSSE attestations; the local log remains |
| 484 | a lightweight registry integrity mechanism, not a drop-in replacement for Rekor. |
| 485 | - **No independent security audit yet.** Planned for 2027. Until then: user-beware, cryptographer-review welcome. Open an issue. |
| 486 | - **Rust port is broad but not final.** The Rust workspace now covers the |
| 487 | cryptographic core, manifests, containers, policy checks, watermark |
| 488 | detection, semantic helpers, format adapters, and the Axum/SQLx registry. |
| 489 | Python remains the canonical reference until the Rust registry finishes |
| 490 | deployment burn-in, migration validation, and v1.0 wire-format freeze. |
| 491 | |
| 492 | ## License |
| 493 | |
| 494 | Apache 2.0. See `LICENSE`. |