Zion Boggan
repos/Oversight/docs/spec/registry-v1.md
zionboggan.com ↗
345 lines · markdown
History for this file →
1
# Oversight Registry v1 Interop Draft
2
 
3
Status: v1.0 candidate draft. The wire format is not final until the first
4
Oversight v1.0 release tag, but the candidate-frozen compatibility surface is
5
tracked in `docs/REGISTRY_V1_STABILITY.md`. This document tracks the surface a
6
second operator needs to implement to run a registry that the Python and Rust
7
reference clients can treat as interchangeable with the origin deployment.
8
 
9
## Goals
10
 
11
- Let more than one operator run a compatible attribution registry so
12
  "open protocol" is a property of the code and not of a hostname.
13
- Preserve issuer-signed manifest authority: every registration sidecar
14
  MUST match the manifest's signed `beacons` and `watermarks` arrays
15
  byte for byte.
16
- Keep beacon callbacks authenticated between DNS or web beacon
17
  collectors and the registry so spoofed events cannot pollute the
18
  attribution record.
19
- Preserve local or public transparency-log evidence for every
20
  registration and every event, and expose proofs that a federated
21
  verifier can fetch without trusting the operator.
22
 
23
## Common Requirements
24
 
25
### Transport
26
 
27
- All request and response bodies are JSON unless a specific endpoint
28
  says otherwise. Content-Type MUST be `application/json; charset=utf-8`
29
  for request bodies that carry one.
30
- Registries MUST reject identifiers larger than 256 bytes for each of
31
  `file_id`, `mark_id`, `token_id`, `recipient_id`, and `issuer_id`.
32
- Registries SHOULD apply a per-client rate limit and return HTTP 429
33
  with the standard error envelope when exceeded.
34
 
35
### Canonicalization
36
 
37
The manifest signature is computed over a canonical JSON serialization
38
per RFC 8785 (JSON Canonicalization Scheme, "JCS"). Implementations that
39
deviate cannot verify manifests produced by the reference client.
40
 
41
1. Keys are sorted recursively by UTF-16 code unit (RFC 8785 §3.2.3).
42
2. String values are emitted as raw UTF-8 with only the mandatory JCS
43
   escapes (`"`, `\`, and `U+0000`-`U+001F`); non-ASCII characters are
44
   NOT escaped as `\uXXXX`.
45
3. Separators are `","` and `":"` with no whitespace.
46
4. The serialized string is encoded as UTF-8 before being fed to the
47
   Ed25519 verifier.
48
5. The `signature_ed25519` field is stripped before canonicalization
49
   and re-attached to the signed object before it is wire-transmitted.
50
 
51
In Python the canonical form is produced by
52
`oversight_core.jcs.jcs_dumps(manifest)`. In Rust the reference uses
53
`serde_jcs::to_vec` with identical output. The cross-language conformance
54
suite (`oversight-rust/tests/conformance_cross_lang.sh`) pins this with
55
both an ASCII baseline and a non-ASCII `recipient_id` round trip that
56
would diverge under any non-JCS serialization.
57
 
58
### Signature verification
59
 
60
- Registries MUST verify `manifest.signature_ed25519` before writing
61
  any beacon, watermark, corpus hash, Rekor entry, or transparency-log
62
  event.
63
- Registries MUST NOT accept beacon or watermark sidecars that differ
64
  from the manifest's signed arrays. Comparison uses the canonicalized
65
  per-item JSON after sorting by canonical bytes.
66
- Re-registration under the same `file_id` MUST require the same
67
  `issuer_ed25519_pub` as the original record. A mismatch returns
68
  HTTP 409.
69
 
70
### Operator authentication
71
 
72
Public operator deployments SHOULD protect write-side registry APIs with
73
an operator token. If configured, `POST /register` and `POST /attribute`
74
MUST require either `Authorization: Bearer <token>` or
75
`X-Oversight-Operator-Token: <token>`. Leaving the token unset preserves
76
local development and unauthenticated conformance-harness behavior.
77
 
78
### Error envelope
79
 
80
Non-2xx responses MUST carry a JSON envelope:
81
 
82
```json
83
{"error": {"code": "signature_invalid", "message": "manifest signature invalid"}}
84
```
85
 
86
Implementations MAY include additional fields under `error` (for
87
example, `retry_after` on 429), but consumers rely only on `code`
88
and `message`.
89
 
90
The defined `code` values in v1:
91
 
92
| Code | HTTP | When |
93
|------|------|------|
94
| `missing_field` | 400 | A required field is absent |
95
| `signature_invalid` | 400 | Manifest Ed25519 verification failed |
96
| `sidecar_mismatch` | 400 | Request beacons or watermarks differ from the signed manifest |
97
| `issuer_mismatch` | 409 | `file_id` already registered under a different issuer pubkey |
98
| `auth_required` | 401 | DNS event callback missing required secret |
99
| `rate_limited` | 429 | Client exceeded per-key token bucket |
100
| `not_found` | 404 | Queried record does not exist |
101
| `server_error` | 500 | Registry internal failure |
102
 
103
## Endpoints
104
 
105
| Method | Path | Purpose |
106
|--------|------|---------|
107
| `GET`  | `/health` | Liveness and local tlog size |
108
| `GET`  | `/.well-known/oversight-registry` | Registry identity advertisement |
109
| `POST` | `/register` | Register signed manifest, beacons, watermarks, optional corpus hashes |
110
| `POST` | `/attribute` | Look up attribution by `token_id`, `mark_id`, or perceptual hash |
111
| `POST` | `/dns_event` | Authenticated DNS beacon callback |
112
| `GET`  | `/evidence/{file_id}` | Evidence bundle with manifest, events, tlog proofs, and signed tree head |
113
| `GET`  | `/tlog/head` | Current signed tree head for the local transparency log |
114
| `GET`  | `/tlog/proof/{index}` | Inclusion proof for a local tlog entry |
115
| `GET`  | `/tlog/range` | Entry range, used by federated verifiers or monitors |
116
| `GET`  | `/p/{token_id}.png` | HTTP pixel beacon, records an event |
117
| `GET`  | `/r/{token_id}`, `/ocsp/r/{token_id}` | OCSP-shaped beacon, records an event |
118
| `GET`  | `/v/{token_id}`, `/lic/v/{token_id}` | License-check beacon, records an event |
119
| `GET`  | `/candidates/semantic` | Recent L3 mark IDs for scraper-style verification |
120
 
121
## `/health`
122
 
123
```json
124
{"status": "ok", "service": "oversight-registry", "version": "0.2.1", "tlog_size": 42}
125
```
126
 
127
`status` is `"ok"` or `"degraded"`. `service` MUST begin with
128
`oversight-registry` so identity cannot be counterfeited without an
129
intentional lie. `tlog_size` is the current local transparency-log
130
leaf count.
131
 
132
## `/.well-known/oversight-registry`
133
 
134
```json
135
{
136
  "ed25519_pub": "<hex>",
137
  "version": "0.2.1",
138
  "jurisdiction": "GLOBAL",
139
  "tlog_size": 42,
140
  "federation": {
141
    "spec_version": "v1",
142
    "canonicalization": "json-sort-keys-compact-utf8",
143
    "rekor_enabled": true
144
  }
145
}
146
```
147
 
148
`ed25519_pub` is the registry's own signing key hex and is the stable
149
identifier a federated verifier uses to tell operators apart.
150
`federation.spec_version` MUST be `"v1"` for registries that implement
151
this document. Unknown `federation.*` fields MUST be ignored by
152
consumers so the shape can extend without breaking older clients.
153
 
154
## `/register`
155
 
156
Request:
157
 
158
```json
159
{
160
  "manifest": { "...": "see docs/SPEC.md" },
161
  "beacons":  [ { "token_id": "...", "kind": "dns|http|ocsp|license" } ],
162
  "watermarks": [ { "mark_id": "...", "layer": "L1|L2|L3_semantic" } ],
163
  "corpus": { "winnowing": "optional-hash", "sentence": "optional-hash" }
164
}
165
```
166
 
167
Validation order:
168
 
169
1. `manifest.file_id` MUST be present and fit the 256-byte bound.
170
2. `manifest.signature_ed25519` MUST verify over the canonical bytes
171
   (see Canonicalization).
172
3. `manifest.issuer_ed25519_pub` MUST be present.
173
4. `beacons` and `watermarks` sidecars MUST equal the signed arrays
174
   under canonical comparison.
175
5. Prior registration of the same `file_id` MUST have come from the
176
   same `issuer_ed25519_pub`.
177
6. A transparency-log event is appended before the response is sent.
178
7. If Rekor attestation is enabled, the registry uses
179
   `subject.name = "mark:<mark_id>"` and
180
   `subject.digest.sha256 = manifest.content_hash`.
181
 
182
Success response:
183
 
184
```json
185
{
186
  "ok": true,
187
  "file_id": "uuid",
188
  "registered_beacons": 1,
189
  "tlog_index": 42,
190
  "rekor": {"log_url": "...", "log_index": 12345, "log_id": "...", "integrated_time": 1730000000}
191
}
192
```
193
 
194
`rekor` is present when public attestation is enabled. Absent or empty
195
`rekor` is not an error.
196
 
197
## `/attribute`
198
 
199
Request accepts exactly one of `token_id`, `mark_id` (with optional
200
`layer`), or `perceptual_hash`. Missing or multiple-populated bodies
201
return `missing_field`.
202
 
203
Success response on a hit:
204
 
205
```json
206
{
207
  "found": true,
208
  "file_id": "uuid",
209
  "recipient_id": "...",
210
  "issuer_id": "...",
211
  "manifest": { "..." : "..." },
212
  "events": [ { "kind": "dns", "timestamp": 0, "source_ip": "..." } ]
213
}
214
```
215
 
216
A miss returns `{"found": false}` with HTTP 200. Bare 404s are reserved
217
for unknown endpoints, not for search misses.
218
 
219
## `/dns_event`
220
 
221
Request:
222
 
223
```json
224
{
225
  "token_id": "hex-or-url-safe",
226
  "client_ip": "collector-observed-ip",
227
  "qtype": "A",
228
  "qname": "token.beacon.example"
229
}
230
```
231
 
232
Authentication:
233
 
234
- Loopback clients are trusted without a secret so a DNS server on
235
  the same host can call without extra configuration.
236
- Non-loopback callers MUST send either `Authorization: Bearer <secret>`
237
  or `X-Oversight-DNS-Secret: <secret>` matching the registry's configured
238
  secret. The comparison MUST be constant-time (`hmac.compare_digest` or
239
  equivalent).
240
- A registry that has no secret configured MUST refuse non-loopback
241
  callers. Silent acceptance of unauthenticated non-loopback events
242
  is a conformance failure.
243
 
244
Success response:
245
 
246
```json
247
{"ok": true, "tlog_index": 42}
248
```
249
 
250
## `/evidence/{file_id}`
251
 
252
Evidence bundles carry everything a recipient or auditor needs to
253
verify attribution without trusting the registry operator. The reference
254
shape is flat so a verifier can pull each artifact with a single JSON
255
dereference.
256
 
257
Required top-level fields:
258
 
259
- `file_id`: echoes the path parameter
260
- `bundle_generated_at`: registry clock timestamp, for context
261
- `registry_pub`: the registry's Ed25519 public key hex, matching
262
  `/.well-known/oversight-registry`
263
- `manifest`: the signed manifest object (signature still attached)
264
- `beacons`: registered beacon rows for this file
265
- `watermarks`: registered watermark rows for this file
266
- `events`: registry event rows for this file, ordered by timestamp
267
- `tlog_head`: the current signed tree head; when the registry has no
268
  transparency log configured, this field is `null`
269
- `tlog_proofs`: array of inclusion proofs for the rows in `events`
270
  that have a `tlog_index`; each proof carries `event_row`,
271
  `tlog_index`, and `inclusion`
272
 
273
Optional fields:
274
 
275
- `rekor`: the sigstore-compatible DSSE bundle when public attestation
276
  is enabled; `bundle_schema` MUST be `2`
277
- `disclaimer`: a human-readable note about the bundle's legal posture
278
- `bundle_signature_ed25519`: registry signature over the canonical
279
  bundle bytes, present on all conforming responses
280
 
281
Unknown `file_id` returns HTTP 404 with the standard error envelope.
282
 
283
## `/tlog/head`, `/tlog/proof/{index}`, `/tlog/range`
284
 
285
These expose the local transparency log so a federated verifier can
286
monitor it without relying on the registry's own query responses.
287
The signed tree head MUST be Ed25519-signed by the registry identity
288
key advertised at `/.well-known/oversight-registry`.
289
`/tlog/range` entries carry `index`, `leaf_hash`, `leaf_data`, and MAY
290
carry `leaf_data_hex`. `leaf_data_hex`, when present, is the exact leaf
291
bytes encoded as lowercase hex. Verifiers MUST recompute
292
`SHA-256(0x00 || leaf_bytes)` and compare it to `leaf_hash`; legacy
293
entries without `leaf_data_hex` use the UTF-8 bytes of `leaf_data`.
294
Registries MUST fail a range request rather than omit malformed,
295
non-contiguous, or hash-mismatched records from the requested window.
296
 
297
## Beacon endpoints
298
 
299
Beacon paths are normative because manifests embed URLs that follow
300
these shapes and the Python and Rust clients assemble them the same
301
way.
302
 
303
| Path | Kind stored in `events` |
304
|------|------------------------|
305
| `GET /p/{token_id}.png` | `http_img` |
306
| `GET /r/{token_id}`, `GET /ocsp/r/{token_id}` | `ocsp` |
307
| `GET /v/{token_id}`, `GET /lic/v/{token_id}` | `license` |
308
 
309
Responses MUST return 200 for well-formed token IDs so resolvers and
310
document viewers do not retry. The pixel endpoint returns a 1x1 PNG;
311
the OCSP endpoint returns an empty 200; the license endpoint returns
312
`{"valid": true}`.
313
 
314
## Federation notes
315
 
316
The wire format MUST NOT require the official `oversightprotocol.dev`
317
domain. Operators run their own registry and beacon domains; manifests
318
declare the registry URL and beacon descriptors unambiguously.
319
 
320
Operators SHOULD:
321
 
322
- Publish `/.well-known/oversight-registry` on HTTPS.
323
- Serve a stable `ed25519_pub`. Rotating this key breaks the chain
324
  of evidence for already-registered files.
325
- Run Rekor attestation enabled so the public log is the root of
326
  trust for federated verifiers.
327
 
328
## Conformance
329
 
330
The repository ships a conformance harness at
331
`tests/test_registry_conformance.py` that exercises every endpoint in
332
this document against a registry URL. The harness is the canonical
333
test of whether an independent implementation is compatible. Operators
334
run it with:
335
 
336
```
337
OVERSIGHT_REGISTRY_URL=https://registry.example.org \
338
  python3 tests/test_registry_conformance.py
339
```
340
 
341
The harness uses a throwaway issuer identity, posts a minimal valid
342
manifest, and then validates the responses. It also checks representative
343
error envelope codes for malformed or missing inputs. Runs against the
344
local reference registry are included in CI; operator-hosted runs are the
345
interop acceptance gate for federation.