| 1 | # Oversight Registry v1 Interop Draft |
| 2 | |
| 3 | Status: v1.0 candidate draft. The wire format is not final until the first |
| 4 | Oversight v1.0 release tag, but the candidate-frozen compatibility surface is |
| 5 | tracked in `docs/REGISTRY_V1_STABILITY.md`. This document tracks the surface a |
| 6 | second operator needs to implement to run a registry that the Python and Rust |
| 7 | reference clients can treat as interchangeable with the origin deployment. |
| 8 | |
| 9 | ## Goals |
| 10 | |
| 11 | - Let more than one operator run a compatible attribution registry so |
| 12 | "open protocol" is a property of the code and not of a hostname. |
| 13 | - Preserve issuer-signed manifest authority: every registration sidecar |
| 14 | MUST match the manifest's signed `beacons` and `watermarks` arrays |
| 15 | byte for byte. |
| 16 | - Keep beacon callbacks authenticated between DNS or web beacon |
| 17 | collectors and the registry so spoofed events cannot pollute the |
| 18 | attribution record. |
| 19 | - Preserve local or public transparency-log evidence for every |
| 20 | registration and every event, and expose proofs that a federated |
| 21 | verifier can fetch without trusting the operator. |
| 22 | |
| 23 | ## Common Requirements |
| 24 | |
| 25 | ### Transport |
| 26 | |
| 27 | - All request and response bodies are JSON unless a specific endpoint |
| 28 | says otherwise. Content-Type MUST be `application/json; charset=utf-8` |
| 29 | for request bodies that carry one. |
| 30 | - Registries MUST reject identifiers larger than 256 bytes for each of |
| 31 | `file_id`, `mark_id`, `token_id`, `recipient_id`, and `issuer_id`. |
| 32 | - Registries SHOULD apply a per-client rate limit and return HTTP 429 |
| 33 | with the standard error envelope when exceeded. |
| 34 | |
| 35 | ### Canonicalization |
| 36 | |
| 37 | The manifest signature is computed over a canonical JSON serialization |
| 38 | per RFC 8785 (JSON Canonicalization Scheme, "JCS"). Implementations that |
| 39 | deviate cannot verify manifests produced by the reference client. |
| 40 | |
| 41 | 1. Keys are sorted recursively by UTF-16 code unit (RFC 8785 §3.2.3). |
| 42 | 2. String values are emitted as raw UTF-8 with only the mandatory JCS |
| 43 | escapes (`"`, `\`, and `U+0000`-`U+001F`); non-ASCII characters are |
| 44 | NOT escaped as `\uXXXX`. |
| 45 | 3. Separators are `","` and `":"` with no whitespace. |
| 46 | 4. The serialized string is encoded as UTF-8 before being fed to the |
| 47 | Ed25519 verifier. |
| 48 | 5. The `signature_ed25519` field is stripped before canonicalization |
| 49 | and re-attached to the signed object before it is wire-transmitted. |
| 50 | |
| 51 | In Python the canonical form is produced by |
| 52 | `oversight_core.jcs.jcs_dumps(manifest)`. In Rust the reference uses |
| 53 | `serde_jcs::to_vec` with identical output. The cross-language conformance |
| 54 | suite (`oversight-rust/tests/conformance_cross_lang.sh`) pins this with |
| 55 | both an ASCII baseline and a non-ASCII `recipient_id` round trip that |
| 56 | would diverge under any non-JCS serialization. |
| 57 | |
| 58 | ### Signature verification |
| 59 | |
| 60 | - Registries MUST verify `manifest.signature_ed25519` before writing |
| 61 | any beacon, watermark, corpus hash, Rekor entry, or transparency-log |
| 62 | event. |
| 63 | - Registries MUST NOT accept beacon or watermark sidecars that differ |
| 64 | from the manifest's signed arrays. Comparison uses the canonicalized |
| 65 | per-item JSON after sorting by canonical bytes. |
| 66 | - Re-registration under the same `file_id` MUST require the same |
| 67 | `issuer_ed25519_pub` as the original record. A mismatch returns |
| 68 | HTTP 409. |
| 69 | |
| 70 | ### Operator authentication |
| 71 | |
| 72 | Public operator deployments SHOULD protect write-side registry APIs with |
| 73 | an operator token. If configured, `POST /register` and `POST /attribute` |
| 74 | MUST require either `Authorization: Bearer <token>` or |
| 75 | `X-Oversight-Operator-Token: <token>`. Leaving the token unset preserves |
| 76 | local development and unauthenticated conformance-harness behavior. |
| 77 | |
| 78 | ### Error envelope |
| 79 | |
| 80 | Non-2xx responses MUST carry a JSON envelope: |
| 81 | |
| 82 | ```json |
| 83 | {"error": {"code": "signature_invalid", "message": "manifest signature invalid"}} |
| 84 | ``` |
| 85 | |
| 86 | Implementations MAY include additional fields under `error` (for |
| 87 | example, `retry_after` on 429), but consumers rely only on `code` |
| 88 | and `message`. |
| 89 | |
| 90 | The defined `code` values in v1: |
| 91 | |
| 92 | | Code | HTTP | When | |
| 93 | |------|------|------| |
| 94 | | `missing_field` | 400 | A required field is absent | |
| 95 | | `signature_invalid` | 400 | Manifest Ed25519 verification failed | |
| 96 | | `sidecar_mismatch` | 400 | Request beacons or watermarks differ from the signed manifest | |
| 97 | | `issuer_mismatch` | 409 | `file_id` already registered under a different issuer pubkey | |
| 98 | | `auth_required` | 401 | DNS event callback missing required secret | |
| 99 | | `rate_limited` | 429 | Client exceeded per-key token bucket | |
| 100 | | `not_found` | 404 | Queried record does not exist | |
| 101 | | `server_error` | 500 | Registry internal failure | |
| 102 | |
| 103 | ## Endpoints |
| 104 | |
| 105 | | Method | Path | Purpose | |
| 106 | |--------|------|---------| |
| 107 | | `GET` | `/health` | Liveness and local tlog size | |
| 108 | | `GET` | `/.well-known/oversight-registry` | Registry identity advertisement | |
| 109 | | `POST` | `/register` | Register signed manifest, beacons, watermarks, optional corpus hashes | |
| 110 | | `POST` | `/attribute` | Look up attribution by `token_id`, `mark_id`, or perceptual hash | |
| 111 | | `POST` | `/dns_event` | Authenticated DNS beacon callback | |
| 112 | | `GET` | `/evidence/{file_id}` | Evidence bundle with manifest, events, tlog proofs, and signed tree head | |
| 113 | | `GET` | `/tlog/head` | Current signed tree head for the local transparency log | |
| 114 | | `GET` | `/tlog/proof/{index}` | Inclusion proof for a local tlog entry | |
| 115 | | `GET` | `/tlog/range` | Entry range, used by federated verifiers or monitors | |
| 116 | | `GET` | `/p/{token_id}.png` | HTTP pixel beacon, records an event | |
| 117 | | `GET` | `/r/{token_id}`, `/ocsp/r/{token_id}` | OCSP-shaped beacon, records an event | |
| 118 | | `GET` | `/v/{token_id}`, `/lic/v/{token_id}` | License-check beacon, records an event | |
| 119 | | `GET` | `/candidates/semantic` | Recent L3 mark IDs for scraper-style verification | |
| 120 | |
| 121 | ## `/health` |
| 122 | |
| 123 | ```json |
| 124 | {"status": "ok", "service": "oversight-registry", "version": "0.2.1", "tlog_size": 42} |
| 125 | ``` |
| 126 | |
| 127 | `status` is `"ok"` or `"degraded"`. `service` MUST begin with |
| 128 | `oversight-registry` so identity cannot be counterfeited without an |
| 129 | intentional lie. `tlog_size` is the current local transparency-log |
| 130 | leaf count. |
| 131 | |
| 132 | ## `/.well-known/oversight-registry` |
| 133 | |
| 134 | ```json |
| 135 | { |
| 136 | "ed25519_pub": "<hex>", |
| 137 | "version": "0.2.1", |
| 138 | "jurisdiction": "GLOBAL", |
| 139 | "tlog_size": 42, |
| 140 | "federation": { |
| 141 | "spec_version": "v1", |
| 142 | "canonicalization": "json-sort-keys-compact-utf8", |
| 143 | "rekor_enabled": true |
| 144 | } |
| 145 | } |
| 146 | ``` |
| 147 | |
| 148 | `ed25519_pub` is the registry's own signing key hex and is the stable |
| 149 | identifier a federated verifier uses to tell operators apart. |
| 150 | `federation.spec_version` MUST be `"v1"` for registries that implement |
| 151 | this document. Unknown `federation.*` fields MUST be ignored by |
| 152 | consumers so the shape can extend without breaking older clients. |
| 153 | |
| 154 | ## `/register` |
| 155 | |
| 156 | Request: |
| 157 | |
| 158 | ```json |
| 159 | { |
| 160 | "manifest": { "...": "see docs/SPEC.md" }, |
| 161 | "beacons": [ { "token_id": "...", "kind": "dns|http|ocsp|license" } ], |
| 162 | "watermarks": [ { "mark_id": "...", "layer": "L1|L2|L3_semantic" } ], |
| 163 | "corpus": { "winnowing": "optional-hash", "sentence": "optional-hash" } |
| 164 | } |
| 165 | ``` |
| 166 | |
| 167 | Validation order: |
| 168 | |
| 169 | 1. `manifest.file_id` MUST be present and fit the 256-byte bound. |
| 170 | 2. `manifest.signature_ed25519` MUST verify over the canonical bytes |
| 171 | (see Canonicalization). |
| 172 | 3. `manifest.issuer_ed25519_pub` MUST be present. |
| 173 | 4. `beacons` and `watermarks` sidecars MUST equal the signed arrays |
| 174 | under canonical comparison. |
| 175 | 5. Prior registration of the same `file_id` MUST have come from the |
| 176 | same `issuer_ed25519_pub`. |
| 177 | 6. A transparency-log event is appended before the response is sent. |
| 178 | 7. If Rekor attestation is enabled, the registry uses |
| 179 | `subject.name = "mark:<mark_id>"` and |
| 180 | `subject.digest.sha256 = manifest.content_hash`. |
| 181 | |
| 182 | Success response: |
| 183 | |
| 184 | ```json |
| 185 | { |
| 186 | "ok": true, |
| 187 | "file_id": "uuid", |
| 188 | "registered_beacons": 1, |
| 189 | "tlog_index": 42, |
| 190 | "rekor": {"log_url": "...", "log_index": 12345, "log_id": "...", "integrated_time": 1730000000} |
| 191 | } |
| 192 | ``` |
| 193 | |
| 194 | `rekor` is present when public attestation is enabled. Absent or empty |
| 195 | `rekor` is not an error. |
| 196 | |
| 197 | ## `/attribute` |
| 198 | |
| 199 | Request accepts exactly one of `token_id`, `mark_id` (with optional |
| 200 | `layer`), or `perceptual_hash`. Missing or multiple-populated bodies |
| 201 | return `missing_field`. |
| 202 | |
| 203 | Success response on a hit: |
| 204 | |
| 205 | ```json |
| 206 | { |
| 207 | "found": true, |
| 208 | "file_id": "uuid", |
| 209 | "recipient_id": "...", |
| 210 | "issuer_id": "...", |
| 211 | "manifest": { "..." : "..." }, |
| 212 | "events": [ { "kind": "dns", "timestamp": 0, "source_ip": "..." } ] |
| 213 | } |
| 214 | ``` |
| 215 | |
| 216 | A miss returns `{"found": false}` with HTTP 200. Bare 404s are reserved |
| 217 | for unknown endpoints, not for search misses. |
| 218 | |
| 219 | ## `/dns_event` |
| 220 | |
| 221 | Request: |
| 222 | |
| 223 | ```json |
| 224 | { |
| 225 | "token_id": "hex-or-url-safe", |
| 226 | "client_ip": "collector-observed-ip", |
| 227 | "qtype": "A", |
| 228 | "qname": "token.beacon.example" |
| 229 | } |
| 230 | ``` |
| 231 | |
| 232 | Authentication: |
| 233 | |
| 234 | - Loopback clients are trusted without a secret so a DNS server on |
| 235 | the same host can call without extra configuration. |
| 236 | - Non-loopback callers MUST send either `Authorization: Bearer <secret>` |
| 237 | or `X-Oversight-DNS-Secret: <secret>` matching the registry's configured |
| 238 | secret. The comparison MUST be constant-time (`hmac.compare_digest` or |
| 239 | equivalent). |
| 240 | - A registry that has no secret configured MUST refuse non-loopback |
| 241 | callers. Silent acceptance of unauthenticated non-loopback events |
| 242 | is a conformance failure. |
| 243 | |
| 244 | Success response: |
| 245 | |
| 246 | ```json |
| 247 | {"ok": true, "tlog_index": 42} |
| 248 | ``` |
| 249 | |
| 250 | ## `/evidence/{file_id}` |
| 251 | |
| 252 | Evidence bundles carry everything a recipient or auditor needs to |
| 253 | verify attribution without trusting the registry operator. The reference |
| 254 | shape is flat so a verifier can pull each artifact with a single JSON |
| 255 | dereference. |
| 256 | |
| 257 | Required top-level fields: |
| 258 | |
| 259 | - `file_id`: echoes the path parameter |
| 260 | - `bundle_generated_at`: registry clock timestamp, for context |
| 261 | - `registry_pub`: the registry's Ed25519 public key hex, matching |
| 262 | `/.well-known/oversight-registry` |
| 263 | - `manifest`: the signed manifest object (signature still attached) |
| 264 | - `beacons`: registered beacon rows for this file |
| 265 | - `watermarks`: registered watermark rows for this file |
| 266 | - `events`: registry event rows for this file, ordered by timestamp |
| 267 | - `tlog_head`: the current signed tree head; when the registry has no |
| 268 | transparency log configured, this field is `null` |
| 269 | - `tlog_proofs`: array of inclusion proofs for the rows in `events` |
| 270 | that have a `tlog_index`; each proof carries `event_row`, |
| 271 | `tlog_index`, and `inclusion` |
| 272 | |
| 273 | Optional fields: |
| 274 | |
| 275 | - `rekor`: the sigstore-compatible DSSE bundle when public attestation |
| 276 | is enabled; `bundle_schema` MUST be `2` |
| 277 | - `disclaimer`: a human-readable note about the bundle's legal posture |
| 278 | - `bundle_signature_ed25519`: registry signature over the canonical |
| 279 | bundle bytes, present on all conforming responses |
| 280 | |
| 281 | Unknown `file_id` returns HTTP 404 with the standard error envelope. |
| 282 | |
| 283 | ## `/tlog/head`, `/tlog/proof/{index}`, `/tlog/range` |
| 284 | |
| 285 | These expose the local transparency log so a federated verifier can |
| 286 | monitor it without relying on the registry's own query responses. |
| 287 | The signed tree head MUST be Ed25519-signed by the registry identity |
| 288 | key advertised at `/.well-known/oversight-registry`. |
| 289 | `/tlog/range` entries carry `index`, `leaf_hash`, `leaf_data`, and MAY |
| 290 | carry `leaf_data_hex`. `leaf_data_hex`, when present, is the exact leaf |
| 291 | bytes encoded as lowercase hex. Verifiers MUST recompute |
| 292 | `SHA-256(0x00 || leaf_bytes)` and compare it to `leaf_hash`; legacy |
| 293 | entries without `leaf_data_hex` use the UTF-8 bytes of `leaf_data`. |
| 294 | Registries MUST fail a range request rather than omit malformed, |
| 295 | non-contiguous, or hash-mismatched records from the requested window. |
| 296 | |
| 297 | ## Beacon endpoints |
| 298 | |
| 299 | Beacon paths are normative because manifests embed URLs that follow |
| 300 | these shapes and the Python and Rust clients assemble them the same |
| 301 | way. |
| 302 | |
| 303 | | Path | Kind stored in `events` | |
| 304 | |------|------------------------| |
| 305 | | `GET /p/{token_id}.png` | `http_img` | |
| 306 | | `GET /r/{token_id}`, `GET /ocsp/r/{token_id}` | `ocsp` | |
| 307 | | `GET /v/{token_id}`, `GET /lic/v/{token_id}` | `license` | |
| 308 | |
| 309 | Responses MUST return 200 for well-formed token IDs so resolvers and |
| 310 | document viewers do not retry. The pixel endpoint returns a 1x1 PNG; |
| 311 | the OCSP endpoint returns an empty 200; the license endpoint returns |
| 312 | `{"valid": true}`. |
| 313 | |
| 314 | ## Federation notes |
| 315 | |
| 316 | The wire format MUST NOT require the official `oversightprotocol.dev` |
| 317 | domain. Operators run their own registry and beacon domains; manifests |
| 318 | declare the registry URL and beacon descriptors unambiguously. |
| 319 | |
| 320 | Operators SHOULD: |
| 321 | |
| 322 | - Publish `/.well-known/oversight-registry` on HTTPS. |
| 323 | - Serve a stable `ed25519_pub`. Rotating this key breaks the chain |
| 324 | of evidence for already-registered files. |
| 325 | - Run Rekor attestation enabled so the public log is the root of |
| 326 | trust for federated verifiers. |
| 327 | |
| 328 | ## Conformance |
| 329 | |
| 330 | The repository ships a conformance harness at |
| 331 | `tests/test_registry_conformance.py` that exercises every endpoint in |
| 332 | this document against a registry URL. The harness is the canonical |
| 333 | test of whether an independent implementation is compatible. Operators |
| 334 | run it with: |
| 335 | |
| 336 | ``` |
| 337 | OVERSIGHT_REGISTRY_URL=https://registry.example.org \ |
| 338 | python3 tests/test_registry_conformance.py |
| 339 | ``` |
| 340 | |
| 341 | The harness uses a throwaway issuer identity, posts a minimal valid |
| 342 | manifest, and then validates the responses. It also checks representative |
| 343 | error envelope codes for malformed or missing inputs. Runs against the |
| 344 | local reference registry are included in CI; operator-hosted runs are the |
| 345 | interop acceptance gate for federation. |