Zion Boggan
repos/Oversight/docs/security.md
zionboggan.com ↗
90 lines · markdown
History for this file →
1
# Oversight Security Notes
2
 
3
This document is the honest threat-model companion to the protocol spec. It
4
uses RFC 2119 / BCP 14 language for requirements; those terms are interpreted
5
only when written in all capitals.
6
 
7
## Watermark Layer Limits
8
 
9
| Layer | Screenshot | Reformat | Manual retype | Motivated adversary with vocab |
10
|-------|------------|----------|---------------|--------------------------------|
11
| L1 zero-width | No | Often no | No | No |
12
| L2 whitespace | No | No | No | No |
13
| L3 semantic | Yes | Yes | Often yes | No; canonicalization can defeat it |
14
 
15
L1 and L2 are steganographic convenience layers. They are useful forensic
16
signals but fragile against normalization. L3 is stronger because it encodes
17
choices in visible prose, but that means it changes the recipient copy.
18
 
19
## L3 Semantic Watermark Safety
20
 
21
L3 is opt-in for wording-sensitive documents. The seal path defaults L3 off
22
for legal documents, regulatory filings, technical specifications, source
23
code, SQL, logs, and structured data. When L3 is enabled, users must
24
acknowledge that the recipient copy is textually non-identical to the
25
canonical source. The manifest records `canonical_content_hash` so a dispute
26
can compare the recipient copy against the original source bytes.
27
 
28
Safe L3 application skips conservative protected regions:
29
 
30
- RFC 2119 / BCP 14 requirement keywords such as `MUST`, `SHOULD`, and `MAY`
31
- numerical values with units or percentages
32
- quoted text, inline code, code blocks, and indented code
33
- ALL-CAPS defined terms
34
- likely source-code, SQL, log, and structured-data inputs
35
 
36
`boilerplate` L3 mode marks only header/footer/cover-page style regions and is
37
the preferred mode when a user wants a semantic signal for contracts or
38
filings without changing the body text.
39
 
40
## Collusion Threat Model
41
 
42
L3 synonym choices are deterministic per mark ID. If multiple recipients
43
collude and compare their copies, they can identify controlled vocabulary
44
positions and may canonicalize those positions before leaking. That can defeat
45
L3 attribution silently. Mitigations under evaluation:
46
 
47
- per-recipient vocabulary randomization
48
- stronger candidate scoring that models collusion edits
49
- warnings or thresholds for large recipient sets before L3 is enabled
50
 
51
Until those mitigations land, issuers should treat L3 as attribution evidence
52
against ordinary leaks and low-to-medium effort stripping, not as a perfect
53
collusion-resistant watermark.
54
 
55
## GUI and Local File Safety
56
 
57
The desktop GUI and CLI treat private identity files as high-value key
58
material. Seal and open operations MUST NOT overwrite selected input paths or
59
files that parse as Oversight private keys. Existing non-key outputs require an
60
explicit GUI confirmation, while CLI writes fail closed so operators must choose
61
a new path or remove the old file deliberately. Private-key writes use atomic
62
temporary files plus replacement; POSIX writes request `0600`, and Windows GUI
63
key generation applies a best-effort ACL narrowing after replacement.
64
 
65
Container parsing is intentionally strict. The unsigned suite byte in the binary
66
header must match the signed manifest suite, malformed JSON is normalized to
67
clean `ValueError` failures, unknown manifest fields are rejected, and trailing
68
bytes after the ciphertext are not accepted. These checks keep audit tools and
69
future consumers from trusting attacker-controlled side channels outside the
70
signed manifest and AEAD-protected ciphertext.
71
 
72
## Passive Beacons
73
 
74
Passive beacons are forensic telemetry, not a detection guarantee. Absence of
75
a beacon does not prove absence of a leak. Corporate egress filtering,
76
air-gapped readers, privacy tools, sandboxed previews, and offline workflows
77
can suppress callbacks.
78
 
79
## Jurisdiction Policy
80
 
81
Jurisdiction-by-IP is a soft policy control. It is useful for honest clients,
82
audit trails, and routing decisions, but it is not a cryptographic security
83
boundary. VPNs, proxies, and corporate NATs can defeat or blur IP geolocation.
84
 
85
## RFC 3161 Timestamps
86
 
87
RFC 3161 timestamps prove a datum existed at or before the TSA signing time.
88
They do not prove authorship. The TSA remains a trust anchor. Rekor / DSSE
89
transparency reduces reliance on a single private timestamping service, but it
90
does not eliminate timestamp trust entirely.