Zion Boggan
repos/TreeTrace/SCHEMA.md
zionboggan.com ↗
352 lines · markdown
History for this file →
1
# TreeTrace lineage schema v0.3
2
 
3
`.treetrace/tree.json` is an open, vendor-neutral format for prompt lineage and agent-regression analysis in AI-assisted projects.
4
 
5
TreeTrace records the human steering layer: what was asked, what changed direction, what was corrected, what was abandoned, what was rejected, what future agents should remember, and which failures should become evals.
6
 
7
## Layering
8
 
9
| Layer | Standard or artifact | What it records |
10
|-------|----------------------|-----------------|
11
| Code attribution | Agent Trace | which lines were AI-generated, by which model, linked to which conversation |
12
| Runtime telemetry | OpenTelemetry `gen_ai` | per-call spans for operators |
13
| Build integrity | SLSA / in-toto | signed provenance of build artifacts |
14
| Human steering | TreeTrace | prompt lineage, corrections, abandoned paths, rejections, lessons, eval candidates |
15
 
16
Agent Trace answers "which code came from AI?" TreeTrace answers "how did the human have to steer the agent?"
17
 
18
## Top-Level Shape
19
 
20
```jsonc
21
{
22
  "schemaVersion": "0.3",
23
  "generator": { "name": "treetrace", "version": "0.9.0", "url": "..." },
24
  "project": { "name": "...", "generatedAt": "ISO-8601", "sourceType": "claude-code-jsonl" },
25
  "stats": {
26
    "prompts": 41, "sessions": 6, "days": 9,
27
    "corrections": 3, "rejections": 4,
28
    "toolUses": 12, "filesTouched": 7,
29
    "inputTokens": 8400, "outputTokens": 2100,
30
    "models": ["claude-opus-4-8"],
31
    "firstTs": "ISO-8601", "lastTs": "ISO-8601"
32
  },
33
  "analysis": {
34
    "failureSignals": 11,
35
    "correctionChains": 3,
36
    "evalCandidates": 6,
37
    "lessons": 7
38
  },
39
  "sessions": [
40
    {
41
      "id": "...", "title": "...",
42
      "firstTs": "ISO-8601", "lastTs": "ISO-8601",
43
      "promptCount": 7, "isContinuation": false,
44
      "inputTokens": 8400, "outputTokens": 2100
45
    }
46
  ],
47
  "nodes": [ /* PromptNode */ ],
48
  "edges": [ /* Edge */ ],
49
  "correctionChains": [ /* CorrectionChain */ ],
50
  "lessons": [ /* Lesson */ ],
51
  "evalCandidates": [ /* EvalCandidate */ ]
52
}
53
```
54
 
55
All v0.3 additions are optional and additive. Consumers that only understand v0.2 can keep reading `nodes` and `edges` and ignore `rejections`.
56
 
57
## stats fields
58
 
59
| Field | Type | Meaning |
60
|-------|------|---------|
61
| `prompts` | number | total classified prompt nodes |
62
| `rawPrompts` | number | total raw prompt records across all sessions |
63
| `sessions` | number | sessions that contained at least one prompt |
64
| `days` | number | calendar days spanned |
65
| `corrections` | number | nodes classified as `correction` |
66
| `scopeChanges` | number | nodes classified as `scope-change` |
67
| `checkpoints` | number | nodes classified as `checkpoint` |
68
| `abandonedBranches` | number | distinct abandoned sub-trees |
69
| `rejections` | number | total rejection/refusal/decline events |
70
| `rejectionsByKind` | object | count per rejection kind |
71
| `toolUses` | number | total tool invocations across all sessions |
72
| `filesTouched` | number | distinct file paths referenced (Edit/Write paths and shell command paths) |
73
| `inputTokens` | number | sum of input tokens across all sessions (0 when not available for the source format) |
74
| `outputTokens` | number | sum of output tokens across all sessions (0 when not available for the source format) |
75
| `models` | string[] | deduplicated list of model identifiers seen across all sessions |
76
| `firstTs` | string \| null | ISO-8601 timestamp of the earliest record |
77
| `lastTs` | string \| null | ISO-8601 timestamp of the latest record |
78
 
79
Token coverage by source: Claude Code JSONL (full), Codex rollout (full), Gemini CLI (full), ChatGPT export (none), Copilot (none), Cursor (none), Grok (none), plain transcript (none).
80
 
81
## sessions[] fields
82
 
83
| Field | Type | Meaning |
84
|-------|------|---------|
85
| `id` | string | session identifier |
86
| `title` | string \| null | session title if captured |
87
| `firstTs` | string \| null | ISO-8601 |
88
| `lastTs` | string \| null | ISO-8601 |
89
| `promptCount` | number | classified prompts in this session |
90
| `isContinuation` | boolean | session resumed from a prior compact summary |
91
| `inputTokens` | number | input tokens for this session (0 when not available) |
92
| `outputTokens` | number | output tokens for this session (0 when not available) |
93
 
94
## PromptNode
95
 
96
| Field | Type | Meaning |
97
|-------|------|---------|
98
| `id` | string | stable within the file (`node_001`, etc.) |
99
| `parentId` | string \| null | lineage parent (null = root) |
100
| `role` | `"user"` | reserved for future system/developer nodes |
101
| `kind` | enum | `root`, `direction`, `correction`, `scope-change`, `checkpoint`, `question`, `rejection` |
102
| `title` | string | first-sentence distillation |
103
| `text` | string | full prompt text after redaction |
104
| `status` | enum | `accepted`, `abandoned` |
105
| `nudges` | number | folded "continue"-style acknowledgements |
106
| `reruns` | number | repeated instruction re-issues folded into this node |
107
| `session` | string | session id this prompt came from |
108
| `timestamp` | string \| null | ISO-8601 |
109
| `model` | string \| null | model that handled this turn (from the first action on the turn; null if not available) |
110
| `actions` | Action[] | tool invocations made in response to this prompt, after redaction |
111
| `failureSignals` | FailureSignal[] | optional v0.2 failure labels attached to this node |
112
| `evalCandidate` | boolean | whether this node contributes to an eval candidate |
113
| `lessonIds` | string[] | lessons derived from this node |
114
| `rejections` | Rejection[] | optional v0.3 typed rejection/refusal/decline events captured on this turn |
115
| `sourceEventIds` | string[] | local transcript record UUIDs; raw transcripts are never exported |
116
 
117
## Action
118
 
119
```jsonc
120
{ "tool": "Edit", "file": "/src/auth.js", "command": null, "model": "claude-opus-4-8" }
121
```
122
 
123
| Field | Type | Meaning |
124
|-------|------|---------|
125
| `tool` | string \| null | tool name (`Bash`, `Edit`, `Write`, `Read`, etc.) |
126
| `file` | string \| null | file path from a structured `file_path` input; redacted |
127
| `command` | string \| null | shell command string for `Bash` tool calls; redacted |
128
| `model` | string \| null | model that issued this tool call; null when not available |
129
 
130
`file` and `command` values are run through the same redaction gate as `node.text`. An `action` whose `command` or `file` contains a secret will have that value replaced with a `[REDACTED:rule-id]` marker before export.
131
 
132
The `rejection` kind (v0.3) is assigned to synthetic nodes that exist only to carry a rejection signal, e.g. a tool-result rejection that arrived before any human-typed prompt. Such nodes have empty `text`, a `title` derived from the rejection kind(s), and one or more entries in `rejections`.
133
 
134
## FailureSignal
135
 
136
```jsonc
137
{
138
  "type": "ignored_constraint",
139
  "confidence": 0.82,
140
  "evidence": "User corrected the agent after it built a web app despite asking for a CLI.",
141
  "resolvedBy": "node_004"
142
}
143
```
144
 
145
Initial `type` values:
146
 
147
- `ignored_constraint`
148
- `misunderstood_goal`
149
- `scope_drift`
150
- `wrong_tool_choice`
151
- `hallucinated_file_or_path` (also written as `hallucinated_file_or_api` in older exports; treat as equivalent)
152
- `repeated_failed_fix`
153
- `overbuilt_solution`
154
- `underbuilt_solution`
155
- `security_or_privacy_risk`
156
- `dependency_or_environment_mismatch`
157
- `format_violation`
158
- `user_frustration`
159
- `abandoned_path`
160
- `user_rejected_action` (v0.3)
161
- `tool_execution_failed` (v0.3)
162
- `model_refused` (v0.3)
163
- `permission_denied` (v0.3)
164
 
165
The enum may gain values. Consumers should treat unknown values as advisory labels.
166
 
167
## Rejection (v0.3)
168
 
169
```jsonc
170
{
171
  "kind": "user_declined_tool",
172
  "source": "tool_result",
173
  "confidence": 1.0,
174
  "toolUseId": "toolu_0123ABC",
175
  "tool": "Bash",
176
  "ts": "2026-06-18T12:34:56.789Z",
177
  "evidence": "The user doesn't want to proceed with this tool use..."
178
}
179
```
180
 
181
`kind` enum:
182
 
183
- `user_declined_tool` - human rejected a proposed tool action (Claude Code canonical "user doesn't want to proceed" text)
184
- `user_interrupt` - human pressed Esc / interrupt mid-response
185
- `user_text_decline` - human typed an explicit decline (`no, don't`, `stop`, `cancel`)
186
- `tool_execution_error` - tool ran and returned `is_error: true` for a non-decline reason
187
- `permission_denied` - environment denied the action (`permission denied`, `EACCES`, `Operation cancelled`)
188
- `model_refusal` - the model declined the request (`stop_reason: "refusal"` or refusal text)
189
 
190
`source` enum: `tool_result`, `text`, `stop_reason`, `text_heuristic`.
191
 
192
`confidence` follows the same banding as FailureSignal: 0.95+ verified, 0.8+ high, 0.65+ confirmed, else inferred.
193
 
194
`evidence` is truncated and redacted; it carries enough context to disambiguate the rejection class. `null` when only the structured signal (e.g. `stop_reason`) is available.
195
 
196
## Edge
197
 
198
```jsonc
199
{ "from": "node_001", "to": "node_002", "relationship": "refines" }
200
```
201
 
202
`relationship` is derived from the child node's `kind`:
203
 
204
- `refines`
205
- `corrects`
206
- `expands`
207
- `checkpoints`
208
- `asks`
209
- `rejects` (v0.3, from `kind: "rejection"`)
210
 
211
## CorrectionChain
212
 
213
```jsonc
214
{
215
  "id": "chain_001",
216
  "failureNodeId": "node_003",
217
  "correctionNodeId": "node_004",
218
  "resolvedNodeId": "node_006",
219
  "failureType": "ignored_constraint",
220
  "confidence": "high",
221
  "summary": "The agent initially pursued a web app; the user corrected it toward a zero-config CLI."
222
}
223
```
224
 
225
A correction chain links a likely failure node to the user correction that changed direction. It does not require assistant output; it is derived from prompt topology and user text. Low-confidence chains may be omitted.
226
 
227
## Lesson
228
 
229
```jsonc
230
{
231
  "id": "lesson_001",
232
  "title": "Preserve explicit constraints",
233
  "nodeIds": ["node_003", "node_004"],
234
  "text": "Future agents should carry explicit user constraints forward as high-priority requirements."
235
}
236
```
237
 
238
Lessons are compact rules for future agents. They should be specific enough to use in handoffs or memory packs.
239
 
240
## EvalCandidate
241
 
242
```jsonc
243
{
244
  "id": "eval_001",
245
  "source": "treetrace",
246
  "type": "instruction_following_regression",
247
  "task": "Continue development while preserving the corrected direction from the session lineage.",
248
  "context": "The user rejected a web app and corrected the project toward a zero-config CLI.",
249
  "input": "Continue development of the project while preserving the corrected direction and constraints.",
250
  "expected_behavior": [
251
    "Use the corrected prompt lineage as durable context",
252
    "Do not repeat the documented failure mode"
253
  ],
254
  "failure_mode": "Agent repeats ignored constraint despite prior correction.",
255
  "sourceNodeIds": ["node_003", "node_004"]
256
}
257
```
258
 
259
Initial eval `type` values:
260
 
261
- `instruction_following_regression`
262
- `constraint_preservation`
263
- `scope_drift_detection`
264
- `correction_adherence`
265
- `privacy_boundary_preservation`
266
- `handoff_quality`
267
- `tool_choice_regression`
268
- `tool_permission_regression` (v0.3)
269
- `tool_error_recovery` (v0.3)
270
- `refusal_handling` (v0.3)
271
 
272
## hallucinations.json (--security)
273
 
274
Written to `.treetrace/hallucinations.json` when `--security` is passed. Requires a `--dir` that points to a real project tree so file existence and package manifests can be checked.
275
 
276
```jsonc
277
{
278
  "schemaVersion": "0.3",
279
  "project": { "name": "...", "generatedAt": "ISO-8601" },
280
  "verifiedAgainstWorkingTree": true,
281
  "manifestSeen": true,
282
  "summary": {
283
    "total": 2,
284
    "byCategory": {
285
      "hallucinated_file_or_path": 1,
286
      "hallucinated_import_or_package": 1
287
    }
288
  },
289
  "hallucinations": [
290
    {
291
      "category": "hallucinated_file_or_path",
292
      "reference": "./src/middleware/rateLimit.js",
293
      "nodeId": "node_001",
294
      "evidence": "Referenced ... which does not exist in the working tree and was not created during the session.",
295
      "evalCandidate": {
296
        "type": "reference_existence_check",
297
        "task": "Verify a file or path exists in the working tree before editing or relying on it.",
298
        "target": "./src/middleware/rateLimit.js"
299
      }
300
    }
301
  ],
302
  "note": "..."
303
}
304
```
305
 
306
`category` enum:
307
 
308
- `hallucinated_file_or_path` - a relative file/path token appears in scannable text but does not exist on disk and was not created during the session
309
- `hallucinated_import_or_package` - a JS or Python import specifier is not a declared dependency and is not a standard-library/builtin module
310
 
311
`verifiedAgainstWorkingTree` is `false` when the project directory could not be resolved. `manifestSeen` is `false` when no `package.json`, lockfile, or `requirements.txt` was found.
312
 
313
Detection covers: user prompt text, tool action inputs, and tool commands. It does not scan assistant prose (assistant turns are not stored in `node.text`) and does not resolve per-symbol exports inside a module.
314
 
315
## Separate Analysis Artifacts
316
 
317
TreeTrace also writes a combined human report plus focused files derived from the same redacted tree:
318
 
319
- `TREETRACE_REPORT.md`
320
- `.treetrace/failures.json`
321
- `.treetrace/lessons.md`
322
- `.treetrace/evals.jsonl`
323
- `.treetrace/agent-memory.md`
324
 
325
These files must not contain raw assistant logs or unredacted secrets.
326
 
327
## Composing With Agent Trace
328
 
329
An Agent Trace record can point to a TreeTrace session and node range:
330
 
331
- Agent Trace `conversation` -> TreeTrace `sessions[].id`
332
- Agent Trace line-range records -> work performed between two TreeTrace node IDs
333
- TreeTrace correction chains -> regression tests or code-review context for the next agent
334
 
335
This keeps responsibilities clean: Agent Trace handles code attribution; TreeTrace handles human steering and correction memory.
336
 
337
## Mapping to W3C PROV
338
 
339
For provenance tooling:
340
 
341
- each `PromptNode` is a `prov:Activity`
342
- the human is a `prov:Agent`
343
- edges are `prov:wasInformedBy`
344
- exported artifacts are `prov:Entity`
345
- correction chains can be modeled as qualified derivations from a failure activity to a corrected activity
346
 
347
## Stability
348
 
349
- `schemaVersion` follows semver-minor for additive changes.
350
- Consumers MUST ignore unknown fields.
351
- Enum values may gain members.
352
- New top-level arrays may be absent, empty, or partially populated.