Zion Boggan zionboggan.com ↗

Add Rust registry database validation

Co-authored-by: Codex (GPT-5.4) <noreply@openai.com>
e601bcd   Zion Boggan committed on May 20, 2026 (1 month ago)
CHANGELOG.md +5 -0
@@ -24,6 +24,11 @@
copies the Python reference registry's manifests, beacons, watermarks,
events, and corpus rows into the Rust SQLite schema while preserving event
IDs, corpus metadata, and registry evidence relationships.
+- **Rust registry integrity validation.** `oversight-registry --validate-db`
+ now checks migrated Rust registry databases for orphaned attribution rows,
+ identity mismatches, malformed manifest JSON, invalid manifest signatures,
+ and manifest/file ID divergence before operators declare migration burn-in
+ complete.
- **Rust policy test parity.** Fixed the `oversight-policy` crate's manifest
fixture after the v0.4.11 `Recipient.p256_pub` schema addition so the full
Rust workspace test suite compiles again.
README.md +16 -8
@@ -78,7 +78,14 @@ oversight-registry --db rust-registry.sqlite \
```
Remove `--migrate-dry-run` to copy manifests, beacons, watermarks, events, and
-corpus rows into the Rust database.
+corpus rows into the Rust database, then run:
+
+```bash
+oversight-registry --db rust-registry.sqlite --validate-db
+```
+
+The validator reports orphan rows, manifest/signature failures, and identity
+mismatches before an operator treats the migrated Rust database as live.
## Current main after v0.4.11
@@ -86,13 +93,14 @@ corpus rows into the Rust database.
path now includes the Compose/Caddy `live` profile, `.env.example` operator
secrets, and shared write-side token enforcement across the Python FastAPI
and Rust Axum registries. The Rust registry also has Python-to-Rust SQLite
-migration tooling (`--migrate-from`, `--migrate-dry-run`) so operators can
-preflight and copy attribution rows without treating the Python reference as
-a permanent production dependency.
+migration tooling (`--migrate-from`, `--migrate-dry-run`) and a native
+`--validate-db` integrity report so operators can preflight, copy, and verify
+attribution rows without treating the Python reference as a permanent
+production dependency.
The next Rust-registry gate is operational burn-in: longer-running deployment
-tests, migration validation against real operator databases, and a final
-wire-format stability declaration before v1.0.
+tests against real operator databases and a final wire-format stability
+declaration before v1.0.
## Quick start
@@ -426,13 +434,13 @@ project does not backport fixes below the current stable line.
| Rust oversight-formats | 40 | green |
| Rust oversight-manifest | 3 | green |
| Rust oversight-policy | 7 | green |
-| Rust oversight-registry | 8 | green |
+| Rust oversight-registry | 10 | green |
| Rust oversight-rekor | 10 | green |
| Rust oversight-semantic | 8 | green |
| Rust oversight-tlog | 7 | green |
| Rust oversight-watermark | 4 | green |
| Cross-language conformance | 3 | green |
-| Total automated Rust unit tests | 125 | all green |
+| Total automated Rust unit tests | 127 | all green |
## Design principles (what Oversight never does)
docs/REGISTRY_DEPLOYMENT.md +13 -2
@@ -124,5 +124,16 @@ oversight-registry \
The migration copies into the Rust target database after running its schema
migrations. It preserves `events.id`, `events.tlog_index`, corpus `metadata`,
and the manifest/beacon/watermark relationships that evidence bundles depend
-on. Keep the Python database as a rollback artifact until live conformance and
-evidence-bundle checks pass against the Rust service.
+on. Validate the copied database before switching traffic:
+
+```bash
+oversight-registry \
+ --db /var/lib/oversight/rust-registry.sqlite \
+ --validate-db
+```
+
+The validation command prints JSON counts plus integrity failures for orphaned
+beacons, watermarks, events, corpus rows, identity mismatches, malformed
+manifest JSON, invalid manifest signatures, and manifest/file ID divergence.
+Keep the Python database as a rollback artifact until validation, live
+conformance, and evidence-bundle checks pass against the Rust service.
docs/ROADMAP.md +6 -3
@@ -240,8 +240,11 @@ reference for write-side operator-token auth and DNS bridge bearer/header
auth. As of 2026-05-17, `oversight-registry --migrate-from` can copy the
Python registry's manifests, beacons, watermarks, events, and corpus rows
into the Rust SQLite schema, with `--migrate-dry-run` for count-only
-preflight. Remaining work: longer-running deployment tests and a wire-format
-stability declaration before declaring v1.0 ready.
+preflight. As of 2026-05-20, `--validate-db` checks the copied Rust database
+for orphan rows, identity mismatches, malformed manifest JSON, invalid
+manifest signatures, and manifest/file ID divergence. Remaining work:
+longer-running deployment tests and a wire-format stability declaration before
+declaring v1.0 ready.
---
@@ -314,7 +317,7 @@ via VM and retype, hardware-key pull mid-open.
| 9 | Hybrid PQ decrypt in browser | Shipped (2026-05-03) |
| 10 | Outlook add-in | Next |
| 11 | Hardware KeyProvider in Rust | Suite shipped (v0.4.11); PIV provider next |
-| 12 | Rust Axum registry, migration tooling | Migration tooling shipped; deployment burn-in next |
+| 12 | Rust Axum registry, migration tooling | Migration validation shipped; deployment burn-in next |
| 13 | arXiv preprint, threat-model repo document | Mid-term |
| 14 | IETF Internet-Draft, CFRG or equivalent BoF | Mid-term |
| 15 | USENIX Security Cycle 2, Black Hat EU 2026 | Mid-term |
oversight-rust/oversight-registry/src/db.rs +250 -2
@@ -18,6 +18,31 @@ pub struct MigrationReport {
pub corpus: i64,
}
+#[derive(Debug, Clone, serde::Serialize, serde::Deserialize, PartialEq, Eq)]
+pub struct RegistryCounts {
+ pub manifests: i64,
+ pub beacons: i64,
+ pub watermarks: i64,
+ pub events: i64,
+ pub corpus: i64,
+}
+
+#[derive(Debug, Clone, serde::Serialize, serde::Deserialize, PartialEq, Eq)]
+pub struct RegistryIntegrityReport {
+ pub ok: bool,
+ pub counts: RegistryCounts,
+ pub orphan_beacons: i64,
+ pub orphan_watermarks: i64,
+ pub orphan_events: i64,
+ pub orphan_corpus: i64,
+ pub beacon_identity_mismatches: i64,
+ pub watermark_identity_mismatches: i64,
+ pub event_identity_mismatches: i64,
+ pub malformed_manifest_json: i64,
+ pub invalid_manifest_signatures: i64,
+ pub mismatched_manifest_file_ids: i64,
+}
+
pub async fn create_pool(db_path: &Path) -> Result<SqlitePool> {
if let Some(parent) = db_path.parent() {
std::fs::create_dir_all(parent)
@@ -198,6 +223,110 @@ pub async fn migrate_from_sqlite(
result
}
+pub async fn validate_registry_integrity(pool: &SqlitePool) -> Result<RegistryIntegrityReport> {
+ let counts = registry_counts(pool).await?;
+ let orphan_beacons = count_query(
+ pool,
+ "SELECT COUNT(*) FROM beacons b LEFT JOIN manifests m ON b.file_id = m.file_id WHERE m.file_id IS NULL",
+ )
+ .await?;
+ let orphan_watermarks = count_query(
+ pool,
+ "SELECT COUNT(*) FROM watermarks w LEFT JOIN manifests m ON w.file_id = m.file_id WHERE m.file_id IS NULL",
+ )
+ .await?;
+ let orphan_events = count_query(
+ pool,
+ "SELECT COUNT(*) FROM events e LEFT JOIN manifests m ON e.file_id = m.file_id WHERE e.file_id IS NOT NULL AND m.file_id IS NULL",
+ )
+ .await?;
+ let orphan_corpus = count_query(
+ pool,
+ "SELECT COUNT(*) FROM corpus c LEFT JOIN manifests m ON c.file_id = m.file_id WHERE m.file_id IS NULL",
+ )
+ .await?;
+ let beacon_identity_mismatches = count_query(
+ pool,
+ "SELECT COUNT(*) FROM beacons b JOIN manifests m ON b.file_id = m.file_id WHERE b.recipient_id != m.recipient_id OR b.issuer_id != m.issuer_id",
+ )
+ .await?;
+ let watermark_identity_mismatches = count_query(
+ pool,
+ "SELECT COUNT(*) FROM watermarks w JOIN manifests m ON w.file_id = m.file_id WHERE w.recipient_id != m.recipient_id OR w.issuer_id != m.issuer_id",
+ )
+ .await?;
+ let event_identity_mismatches = count_query(
+ pool,
+ "SELECT COUNT(*) FROM events e JOIN manifests m ON e.file_id = m.file_id WHERE (e.recipient_id IS NOT NULL AND e.recipient_id != m.recipient_id) OR (e.issuer_id IS NOT NULL AND e.issuer_id != m.issuer_id)",
+ )
+ .await?;
+
+ let mut malformed_manifest_json = 0;
+ let mut invalid_manifest_signatures = 0;
+ let mut mismatched_manifest_file_ids = 0;
+ let manifest_rows: Vec<(String, String)> =
+ sqlx::query_as("SELECT file_id, manifest_json FROM manifests")
+ .fetch_all(pool)
+ .await?;
+
+ for (file_id, manifest_json) in manifest_rows {
+ match oversight_manifest::Manifest::from_json(manifest_json.as_bytes()) {
+ Ok(manifest) => {
+ if manifest.file_id != file_id {
+ mismatched_manifest_file_ids += 1;
+ }
+ if !manifest.verify().unwrap_or(false) {
+ invalid_manifest_signatures += 1;
+ }
+ }
+ Err(_) => {
+ malformed_manifest_json += 1;
+ }
+ }
+ }
+
+ let ok = orphan_beacons == 0
+ && orphan_watermarks == 0
+ && orphan_events == 0
+ && orphan_corpus == 0
+ && beacon_identity_mismatches == 0
+ && watermark_identity_mismatches == 0
+ && event_identity_mismatches == 0
+ && malformed_manifest_json == 0
+ && invalid_manifest_signatures == 0
+ && mismatched_manifest_file_ids == 0;
+
+ Ok(RegistryIntegrityReport {
+ ok,
+ counts,
+ orphan_beacons,
+ orphan_watermarks,
+ orphan_events,
+ orphan_corpus,
+ beacon_identity_mismatches,
+ watermark_identity_mismatches,
+ event_identity_mismatches,
+ malformed_manifest_json,
+ invalid_manifest_signatures,
+ mismatched_manifest_file_ids,
+ })
+}
+
+async fn registry_counts(pool: &SqlitePool) -> Result<RegistryCounts> {
+ Ok(RegistryCounts {
+ manifests: count_query(pool, "SELECT COUNT(*) FROM manifests").await?,
+ beacons: count_query(pool, "SELECT COUNT(*) FROM beacons").await?,
+ watermarks: count_query(pool, "SELECT COUNT(*) FROM watermarks").await?,
+ events: count_query(pool, "SELECT COUNT(*) FROM events").await?,
+ corpus: count_query(pool, "SELECT COUNT(*) FROM corpus").await?,
+ })
+}
+
+async fn count_query(pool: &SqlitePool, sql: &str) -> Result<i64> {
+ let (count,): (i64,) = sqlx::query_as(sql).fetch_one(pool).await?;
+ Ok(count)
+}
+
async fn validate_source_schema(conn: &mut sqlx::pool::PoolConnection<sqlx::Sqlite>) -> Result<()> {
for table in MIGRATED_TABLES {
let exists: Option<(String,)> = sqlx::query_as(
@@ -561,6 +690,8 @@ pub async fn get_semantic_candidates(
#[cfg(test)]
mod tests {
use super::*;
+ use oversight_crypto::ClassicIdentity;
+ use oversight_manifest::{Manifest, Recipient};
use std::path::PathBuf;
fn temp_dir(label: &str) -> PathBuf {
@@ -575,13 +706,14 @@ mod tests {
}
async fn seed_source(pool: &SqlitePool) {
+ let (issuer_pub, manifest_json) = signed_manifest_json("file-1");
upsert_manifest(
pool,
"file-1",
"recipient-1",
"issuer-1",
- &"ab".repeat(32),
- r#"{"file_id":"file-1"}"#,
+ &issuer_pub,
+ &manifest_json,
10,
)
.await
@@ -637,6 +769,35 @@ mod tests {
.unwrap();
}
+ fn signed_manifest_json(file_id: &str) -> (String, String) {
+ let issuer = ClassicIdentity::generate();
+ let recipient = ClassicIdentity::generate();
+ let mut manifest = Manifest::new(
+ "fixture.txt",
+ "ab".repeat(32),
+ 4096,
+ "issuer-1",
+ hex::encode(issuer.ed25519_pub),
+ Recipient {
+ recipient_id: "recipient-1".into(),
+ x25519_pub: hex::encode(recipient.x25519_pub),
+ ed25519_pub: None,
+ p256_pub: None,
+ },
+ "https://registry.test",
+ "text/plain",
+ None,
+ None,
+ "GLOBAL",
+ );
+ manifest.file_id = file_id.into();
+ manifest.sign(issuer.ed25519_priv.as_ref()).unwrap();
+ (
+ hex::encode(issuer.ed25519_pub),
+ String::from_utf8(manifest.to_json().unwrap()).unwrap(),
+ )
+ }
+
#[tokio::test]
async fn migrate_from_sqlite_copies_python_registry_tables() {
let source_dir = temp_dir("source");
@@ -728,4 +889,91 @@ mod tests {
let _ = std::fs::remove_dir_all(source_dir);
let _ = std::fs::remove_dir_all(dest_dir);
}
+
+ #[tokio::test]
+ async fn validate_registry_integrity_accepts_clean_rows() {
+ let dir = temp_dir("validate-clean");
+ std::fs::create_dir_all(&dir).unwrap();
+ let db_path = dir.join("registry.sqlite");
+ let pool = create_pool(&db_path).await.unwrap();
+ run_migrations(&pool).await.unwrap();
+ seed_source(&pool).await;
+
+ let report = validate_registry_integrity(&pool).await.unwrap();
+ assert!(report.ok);
+ assert_eq!(report.counts.manifests, 1);
+ assert_eq!(report.counts.beacons, 1);
+ assert_eq!(report.malformed_manifest_json, 0);
+ assert_eq!(report.invalid_manifest_signatures, 0);
+
+ pool.close().await;
+ let _ = std::fs::remove_dir_all(dir);
+ }
+
+ #[tokio::test]
+ async fn validate_registry_integrity_reports_bad_rows() {
+ let dir = temp_dir("validate-bad");
+ std::fs::create_dir_all(&dir).unwrap();
+ let db_path = dir.join("registry.sqlite");
+ let pool = create_pool(&db_path).await.unwrap();
+ run_migrations(&pool).await.unwrap();
+ seed_source(&pool).await;
+
+ sqlx::query(
+ "INSERT INTO manifests (file_id, recipient_id, issuer_id, issuer_ed25519_pub, manifest_json, registered_at) VALUES (?, ?, ?, ?, ?, ?)",
+ )
+ .bind("bad-file")
+ .bind("recipient-1")
+ .bind("issuer-1")
+ .bind("00")
+ .bind("{")
+ .bind(20_i64)
+ .execute(&pool)
+ .await
+ .unwrap();
+ upsert_beacon(&pool, "orphan-token", "missing-file", "r", "i", "dns", 21)
+ .await
+ .unwrap();
+ upsert_watermark(&pool, "orphan-mark", "L1", "missing-file", "r", "i", 21)
+ .await
+ .unwrap();
+ insert_event(
+ &pool,
+ "orphan-token",
+ Some("missing-file"),
+ Some("r"),
+ Some("i"),
+ "dns",
+ None,
+ None,
+ None,
+ 21,
+ None,
+ None,
+ )
+ .await
+ .unwrap();
+ sqlx::query(
+ "INSERT INTO corpus (file_id, hash_kind, hash_value, metadata, registered_at) VALUES (?, ?, ?, ?, ?)",
+ )
+ .bind("missing-file")
+ .bind("perceptual")
+ .bind("phash-missing")
+ .bind(None::<String>)
+ .bind(21_i64)
+ .execute(&pool)
+ .await
+ .unwrap();
+
+ let report = validate_registry_integrity(&pool).await.unwrap();
+ assert!(!report.ok);
+ assert_eq!(report.orphan_beacons, 1);
+ assert_eq!(report.orphan_watermarks, 1);
+ assert_eq!(report.orphan_events, 1);
+ assert_eq!(report.orphan_corpus, 1);
+ assert_eq!(report.malformed_manifest_json, 1);
+
+ pool.close().await;
+ let _ = std::fs::remove_dir_all(dir);
+ }
}
oversight-rust/oversight-registry/src/main.rs +17 -0
@@ -61,6 +61,12 @@ struct Args {
#[arg(long, help = "Report migration row counts without writing to --db")]
migrate_dry_run: bool,
+
+ #[arg(
+ long,
+ help = "Validate registry database relationships and signed manifests, print JSON, and exit"
+ )]
+ validate_db: bool,
}
pub struct AppState {
@@ -378,6 +384,17 @@ async fn main() -> anyhow::Result<()> {
return Ok(());
}
+ if args.validate_db {
+ let report = db::validate_registry_integrity(&pool)
+ .await
+ .map_err(|e| anyhow::anyhow!("registry integrity validation failed: {e}"))?;
+ println!("{}", serde_json::to_string_pretty(&report)?);
+ if !report.ok {
+ return Err(anyhow::anyhow!("registry integrity validation failed"));
+ }
+ return Ok(());
+ }
+
let tlog_dir = data_dir.join("tlog");
let identity = load_or_create_identity(&data_dir);
let tlog = TransparencyLog::open_with_signer(