| @@ -2,6 +2,14 @@ | ||
| Notable changes to TreeTrace. The format follows Keep a Changelog, and the project uses semantic versioning. | ||
| + | ## Unreleased | |
| + | ||
| + | ### Added | |
| + | ||
| + | - `--security` focused report mode. Prints a security-focused report that leads with concrete failure classes and answers five questions from the existing analysis: whether the agent touched auth, secrets, access control, crypto, dependency config, CI, deployment, or tests; whether it disabled or skipped tests; whether it ran risky shell commands; whether it referenced files, paths, imports, or packages that do not exist; and which human correction should become a future eval or memory item. It reuses the same signals as the full analysis and does not run a separate scanner. The report prints to stdout and writes `.treetrace/hallucinations.json`, both gated through the redaction shadow scan. | |
| + | - Deterministic hallucination detector. TreeTrace runs inside the repository, so it extracts the files, paths, imports, and packages the agent referenced in prompts and captured actions, then verifies them against the real working tree and `package.json`, `package-lock.json`, and Python manifests. References that do not resolve are flagged as likely hallucinations in two categories, `hallucinated_file_or_path` and `hallucinated_import_or_package`, and surfaced both in the security report and in `.treetrace/hallucinations.json` (mirroring the `failures.json` shape). Each one carries an eval candidate. File and path existence and import and package declaration are checked; per-symbol and per-API resolution inside a module is not attempted, and the tool says so. Files the agent created during the session, relative paths, Node builtins, and Python standard library modules are excluded to avoid false positives. | |
| + | - Read-only MCP server. `treetrace mcp` (or `treetrace --mcp`) starts a Model Context Protocol server over stdio using JSON-RPC 2.0, hand-rolled with no dependencies. It implements `initialize`, `tools/list`, and `tools/call`, and exposes four read-only tools that reuse existing functionality: `handoff`, `lessons`, `security_summary`, and `eval_candidates`. No tool mutates files, runs shell, hits the network, or requires authentication. Every returned text passes the same redaction shadow scan as the file exports. | |
| + | ||
| ## 0.4.1 - 2026-06-13 | ||
| A fix release driven by an adversarial end-to-end test pass across every adapter on real sessions. See [TESTING.md](TESTING.md) for the method and coverage. |
| @@ -81,6 +81,7 @@ Failure to eval to handoff: every correction you made by hand becomes a guardrai | ||
| | `PROMPT_TREE.md` | Human-readable narrative of the build path | | ||
| | `.treetrace/tree.json` | Canonical machine-readable lineage schema | | ||
| | `.treetrace/failures.json` | Failure signals, correction chains, and summaries | | ||
| + | | `.treetrace/hallucinations.json` | Files, paths, imports, and packages the agent referenced that do not exist in the working tree | | |
| | `.treetrace/lessons.md` | Human-readable lessons for future work | | ||
| | `.treetrace/evals.jsonl` | Generic model-agnostic eval cases | | ||
| | `.treetrace/agent-memory.md` | Compact memory pack for Codex, Claude Code, Cursor, or another agent | | ||
| @@ -100,6 +101,8 @@ Failure to eval to handoff: every correction you made by hand becomes a guardrai | ||
| | `npx treetrace --lessons` | Write and print `.treetrace/lessons.md` | | ||
| | `npx treetrace --evals` | Write and print `.treetrace/evals.jsonl` | | ||
| | `npx treetrace --memory` | Write and print `.treetrace/agent-memory.md` | | ||
| + | | `npx treetrace --security` | Print a security-focused report and write `.treetrace/hallucinations.json` | | |
| + | | `npx treetrace mcp` | Start a read-only MCP server over stdio | | |
| | `npx treetrace --titles-only` | Compact human tree, no full prompt details | | ||
| | `npx treetrace --redact-auto` | Redact every detected secret without prompting | | ||
| | `npx treetrace --since 2026-06-01` | Limit to sessions on or after a date | | ||
| @@ -150,6 +153,40 @@ The goal is not judgment. The goal is regression memory: identify what future ag | ||
| The format is intentionally model-agnostic. Adapters for promptfoo, OpenAI Evals-style harnesses, LangSmith-style datasets, and other eval systems can build from this JSONL without changing TreeTrace's local-first core. | ||
| + | ## Security report | |
| + | ||
| + | `treetrace --security` prints a security-focused report that leads with concrete failure classes. It reuses the same analysis as the full run and answers five questions: | |
| + | ||
| + | 1. Did the agent touch auth, secrets, access control, crypto, dependency config, CI, deployment, or tests? | |
| + | 2. Did it disable or skip tests? | |
| + | 3. Did it run risky shell commands? | |
| + | 4. Did it reference files, paths, imports, or packages that do not exist? | |
| + | 5. What human correction should become a future eval or memory item? | |
| + | ||
| + | The report goes to stdout and the run writes `.treetrace/hallucinations.json`. Both pass the redaction shadow scan before anything is printed or written. | |
| + | ||
| + | ## Hallucination detection | |
| + | ||
| + | TreeTrace runs inside the repository, so it can verify what the agent claimed against what is actually there. It extracts the files, paths, imports, and packages referenced in prompts and captured actions, then checks them against the real working tree and the manifests (`package.json`, `package-lock.json`, and Python requirement files). References that do not resolve are flagged in two categories: | |
| + | ||
| + | - `hallucinated_file_or_path` | |
| + | - `hallucinated_import_or_package` | |
| + | ||
| + | Each one becomes an eval candidate, for example "verify the file or import exists before editing." The checks are fully deterministic: file and path existence and import and package declaration. To avoid false positives, files the agent created during the session, relative paths, Node builtins, and Python standard library modules are excluded. | |
| + | ||
| + | This is honest about its limits. File, path, import, and package existence are solid. Per-symbol and per-API resolution inside a module is not attempted, because that would need an AST and a language toolchain, which would break the zero-dependency promise. TreeTrace does not claim to detect a hallucinated function or method on a real module. | |
| + | ||
| + | ## MCP server | |
| + | ||
| + | `treetrace mcp` (or `treetrace --mcp`) starts a Model Context Protocol server over stdio. It speaks JSON-RPC 2.0, is hand-rolled with no dependencies, and implements `initialize`, `tools/list`, and `tools/call`. It exposes four read-only tools, each reusing existing functionality: | |
| + | ||
| + | - `handoff` - the continuation brief for the next agent | |
| + | - `lessons` - accepted constraints and repeated corrections | |
| + | - `security_summary` - evidence-backed security-sensitive touches | |
| + | - `eval_candidates` - compact regression cases | |
| + | ||
| + | No tool mutates files, runs shell, reaches the network, or requires authentication. Every returned text passes the same redaction shadow scan as the file exports. Point it at a project with `--dir`, or import a transcript with `--file` or `--stdin`, exactly like a normal run. | |
| + | ||
| ## Redaction gate | ||
| A privacy-positioned tool gets exactly one chance with your secrets, so every export goes through the same gate: |
| @@ -90,6 +90,35 @@ function isCredentialFile(file) { | ||
| return true; | ||
| } | ||
| + | const SECURITY_SURFACE_RULES = [ | |
| + | { surface: 'auth', re: /(?:^|[\\/])[^\\/]*(?:auth|login|signin|signup|session|oauth|jwt|sso|saml)[^\\/]*$/i }, | |
| + | { surface: 'secrets', re: /(?:^|[\\/])(?:\.env[^\\/]*|[^\\/]*(?:secret|credential|password|passwd|apikey|api[-_]key|token)[^\\/]*)$/i }, | |
| + | { surface: 'access-control', re: /(?:^|[\\/])[^\\/]*(?:rbac|permission|access[-_]?control|policy|policies|guard|middleware)[^\\/]*$/i }, | |
| + | { surface: 'crypto', re: /(?:^|[\\/])[^\\/]*(?:crypto|cipher|encrypt|decrypt|hash|hmac|signature|signing)[^\\/]*$/i }, | |
| + | { surface: 'dependency-config', re: /(?:^|[\\/])(?:package\.json|package-lock\.json|yarn\.lock|pnpm-lock\.yaml|requirements\.txt|pyproject\.toml|Pipfile|go\.mod|Cargo\.toml|Gemfile)$/i }, | |
| + | { surface: 'ci', re: /(?:^|[\\/])(?:\.github[\\/]workflows[\\/][^\\/]+|\.gitlab-ci\.yml|\.circleci[\\/][^\\/]+|azure-pipelines\.yml|Jenkinsfile)$/i }, | |
| + | { surface: 'deployment', re: /(?:^|[\\/])(?:Dockerfile|docker-compose[^\\/]*\.ya?ml|[^\\/]*\.(?:tf|tfvars)|wrangler\.toml|vercel\.json|netlify\.toml|fly\.toml|[^\\/]*deploy[^\\/]*)$/i }, | |
| + | { surface: 'tests', re: /(?:^|[\\/])[^\\/]*(?:\.(?:test|spec)\.[a-z0-9]+|_test\.[a-z0-9]+|test_[^\\/]+)$|(?:^|[\\/])(?:tests?|__tests__|spec)[\\/]/i }, | |
| + | ]; | |
| + | const TEST_SKIP_RE = | |
| + | /\b(?:disabl|skip|remov|delet|comment(?:ed)? out|drop|turn(?:ed)? off|x?(?:it|describe)\.skip|--no-tests?|--skip-tests?)\w*\b[^.\n]{0,24}\btests?\b|\btests?\b[^.\n]{0,24}\b(?:disabl|skip|remov|delet|comment(?:ed)? out|turn(?:ed)? off)\w*/i; | |
| + | ||
| + | export function classifySecuritySurface(file) { | |
| + | if (!file) return null; | |
| + | for (const rule of SECURITY_SURFACE_RULES) { | |
| + | if (rule.re.test(file)) return rule.surface; | |
| + | } | |
| + | return null; | |
| + | } | |
| + | ||
| + | export function isRiskyCommand(command) { | |
| + | return typeof command === 'string' && RISKY_CMD_RE.test(command); | |
| + | } | |
| + | ||
| + | export function mentionsTestSkip(text) { | |
| + | return typeof text === 'string' && text.length <= 4000 && TEST_SKIP_RE.test(text); | |
| + | } | |
| + | ||
| function securityActions(node) { | ||
| const out = []; | ||
| for (const a of node.actions || []) { |
| @@ -18,6 +18,9 @@ import { | ||
| renderMemoryMarkdown, | ||
| } from './analyze.js'; | ||
| import { makeTitle } from './extract.js'; | ||
| + | import { renderHallucinationsJson } from './hallucinate.js'; | |
| + | import { renderSecurityReport } from './security-report.js'; | |
| + | import { startMcpServer } from './mcp.js'; | |
| import { c, plural, truncate } from './util.js'; | ||
| const VERSION = JSON.parse(readFileSync(new URL('../package.json', import.meta.url), 'utf8')).version; | ||
| @@ -35,6 +38,8 @@ Usage: | ||
| treetrace --lessons write and print lessons Markdown | ||
| treetrace --evals write and print eval JSONL | ||
| treetrace --memory write and print compact agent memory | ||
| + | treetrace --security print a security-focused report for this session | |
| + | treetrace mcp start a read-only MCP server over stdio | |
| Options: | ||
| --from <tool> input format for --file: claude, codex, chatgpt, gemini, | ||
| @@ -45,6 +50,8 @@ Options: | ||
| --json also print lineage JSON to stdout | ||
| --analysis write failure, lesson, eval, and memory artifacts | ||
| --titles-only omit full prompt texts from the markdown tree | ||
| + | --security print a security-focused report and write hallucinations.json | |
| + | --mcp start a read-only MCP server over stdio (same as: treetrace mcp) | |
| --redact-auto redact every detected secret without prompting | ||
| --since <YYYY-MM-DD> only include sessions active on/after this date | ||
| --quiet suppress progress output | ||
| @@ -58,11 +65,96 @@ export async function main(argv) { | ||
| const opts = parseArgs(argv); | ||
| if (opts.help) return void console.log(HELP); | ||
| if (opts.version) return void console.log(VERSION); | ||
| + | if (opts.mcp) return await startMcpServer({ argv, version: VERSION }); | |
| const projectDir = resolve(opts.dir || process.cwd()); | ||
| const projectName = detectProjectName(projectDir); | ||
| const log = opts.quiet ? () => {} : (msg) => process.stderr.write(`${msg}\n`); | ||
| + | const { tree, decisions, asked, sourceTool } = await loadRedactedTree(opts, projectDir, projectName, log); | |
| + | ||
| + | const ttDir = join(projectDir, '.treetrace'); | |
| + | const decisionsPath = join(ttDir, 'redactions.json'); | |
| + | ||
| + | const generatedAt = new Date().toISOString(); | |
| + | const renderOpts = { projectName, titlesOnly: opts.titlesOnly, version: VERSION, generatedAt, sourceType: sourceTypeFor(sourceTool) }; | |
| + | ||
| + | if (opts.handoff) { | |
| + | const pack = renderHandoff(tree, renderOpts); | |
| + | assertClean(pack, decisions, 'handoff brief'); | |
| + | process.stdout.write(pack); | |
| + | log(c.green(`โ handoff brief for ${projectName} (${plural(tree.stats.promptCount, 'prompt')} distilled)`)); | |
| + | return; | |
| + | } | |
| + | ||
| + | if (opts.security) { | |
| + | const securityReport = renderSecurityReport(tree, projectDir, renderOpts); | |
| + | const hallucinationsText = JSON.stringify(renderHallucinationsJson(tree, projectDir, renderOpts), null, 2); | |
| + | assertClean(securityReport, decisions, 'security report'); | |
| + | assertClean(hallucinationsText, decisions, 'hallucinations.json'); | |
| + | mkdirSync(projectDir, { recursive: true }); | |
| + | mkdirSync(ttDir, { recursive: true }); | |
| + | writeFileSync(join(ttDir, 'hallucinations.json'), hallucinationsText); | |
| + | writeFileSync(decisionsPath, JSON.stringify(decisions, null, 2)); | |
| + | process.stdout.write(securityReport); | |
| + | log(c.green(`โ security report for ${projectName}; wrote .treetrace/hallucinations.json`)); | |
| + | return; | |
| + | } | |
| + | ||
| + | const md = renderMarkdown(tree, renderOpts); | |
| + | const json = renderJson(tree, renderOpts); | |
| + | const jsonText = JSON.stringify(json, null, 2); | |
| + | const artifacts = analysisArtifacts(ttDir, tree, renderOpts, projectDir); | |
| + | const outPath = resolve(projectDir, opts.out || 'PROMPT_TREE.md'); | |
| + | const reportPath = resolve(projectDir, opts.reportFile || 'TREETRACE_REPORT.md'); | |
| + | const report = renderReportMarkdown(tree, renderOpts); | |
| + | ||
| + | const requested = requestedArtifacts(opts, artifacts); | |
| + | if (requested.length && !opts.report) { | |
| + | for (const artifact of requested) assertClean(artifact.text, decisions, artifact.label); | |
| + | mkdirSync(projectDir, { recursive: true }); | |
| + | mkdirSync(ttDir, { recursive: true }); | |
| + | for (const artifact of requested) writeFileSync(artifact.path, artifact.text); | |
| + | writeFileSync(decisionsPath, JSON.stringify(decisions, null, 2)); | |
| + | if (requested.length === 1) { | |
| + | process.stdout.write(requested[0].text); | |
| + | } else { | |
| + | process.stdout.write(requested.map((a) => `# ${a.label}\n\n${a.text}`).join('\n')); | |
| + | } | |
| + | log(c.green(`wrote ${requested.map((a) => relativeish(a.path, projectDir)).join(', ')}`)); | |
| + | return; | |
| + | } | |
| + | ||
| + | assertClean(md, decisions, 'PROMPT_TREE.md'); | |
| + | assertClean(jsonText, decisions, 'tree.json'); | |
| + | for (const artifact of Object.values(artifacts)) assertClean(artifact.text, decisions, artifact.label); | |
| + | assertClean(report, decisions, 'TREETRACE_REPORT.md'); | |
| + | ||
| + | mkdirSync(projectDir, { recursive: true }); | |
| + | mkdirSync(ttDir, { recursive: true }); | |
| + | writeFileSync(outPath, md); | |
| + | writeFileSync(reportPath, report); | |
| + | writeFileSync(join(ttDir, 'tree.json'), jsonText); | |
| + | for (const artifact of Object.values(artifacts)) writeFileSync(artifact.path, artifact.text); | |
| + | ||
| + | writeFileSync(decisionsPath, JSON.stringify(decisions, null, 2)); | |
| + | ||
| + | if (opts.json) process.stdout.write(jsonText + '\n'); | |
| + | if (opts.report) process.stdout.write(report); | |
| + | ||
| + | log(''); | |
| + | log(summaryLine(tree.stats, projectName)); | |
| + | log(renderTerminalSummary(tree, renderOpts).trimEnd()); | |
| + | previewTree(tree, log); | |
| + | log(''); | |
| + | log( | |
| + | `${c.green('ok')} wrote ${c.bold(relativeish(reportPath, projectDir))}, ${c.bold(relativeish(outPath, projectDir))}, .treetrace/tree.json, and analysis artifacts` | |
| + | ); | |
| + | if (!opts.report) log(c.dim(' run `treetrace --report` to print the human report in this terminal')); | |
| + | if (asked) log(c.dim(` ${plural(asked, 'redaction decision')} saved to .treetrace/redactions.json`)); | |
| + | } | |
| + | ||
| + | export async function loadRedactedTree(opts, projectDir, projectName, log = () => {}, { forceAuto = false } = {}) { | |
| let sessions = []; | ||
| let sourceTool = 'claude'; | ||
| if (opts.stdin) { | ||
| @@ -138,10 +230,10 @@ export async function main(argv) { | ||
| } | ||
| } | ||
| - | const interactive = process.stdin.isTTY && process.stderr.isTTY && !opts.redactAuto; | |
| + | const interactive = !forceAuto && process.stdin.isTTY && process.stderr.isTTY && !opts.redactAuto; | |
| const { decisions, asked, autoRedacted } = await resolveFindings(findings, priorDecisions, { | ||
| interactive, | ||
| - | autoRedact: opts.redactAuto, | |
| + | autoRedact: forceAuto || opts.redactAuto, | |
| }); | ||
| if (autoRedacted) { | ||
| log( | ||
| @@ -165,68 +257,7 @@ export async function main(argv) { | ||
| } | ||
| analyzeTree(tree); | ||
| - | const generatedAt = new Date().toISOString(); | |
| - | const renderOpts = { projectName, titlesOnly: opts.titlesOnly, version: VERSION, generatedAt, sourceType: sourceTypeFor(sourceTool) }; | |
| - | ||
| - | if (opts.handoff) { | |
| - | const pack = renderHandoff(tree, renderOpts); | |
| - | assertClean(pack, decisions, 'handoff brief'); | |
| - | process.stdout.write(pack); | |
| - | log(c.green(`โ handoff brief for ${projectName} (${plural(tree.stats.promptCount, 'prompt')} distilled)`)); | |
| - | return; | |
| - | } | |
| - | ||
| - | const md = renderMarkdown(tree, renderOpts); | |
| - | const json = renderJson(tree, renderOpts); | |
| - | const jsonText = JSON.stringify(json, null, 2); | |
| - | const artifacts = analysisArtifacts(ttDir, tree, renderOpts); | |
| - | const outPath = resolve(projectDir, opts.out || 'PROMPT_TREE.md'); | |
| - | const reportPath = resolve(projectDir, opts.reportFile || 'TREETRACE_REPORT.md'); | |
| - | const report = renderReportMarkdown(tree, renderOpts); | |
| - | ||
| - | const requested = requestedArtifacts(opts, artifacts); | |
| - | if (requested.length && !opts.report) { | |
| - | for (const artifact of requested) assertClean(artifact.text, decisions, artifact.label); | |
| - | mkdirSync(projectDir, { recursive: true }); | |
| - | mkdirSync(ttDir, { recursive: true }); | |
| - | for (const artifact of requested) writeFileSync(artifact.path, artifact.text); | |
| - | writeFileSync(decisionsPath, JSON.stringify(decisions, null, 2)); | |
| - | if (requested.length === 1) { | |
| - | process.stdout.write(requested[0].text); | |
| - | } else { | |
| - | process.stdout.write(requested.map((a) => `# ${a.label}\n\n${a.text}`).join('\n')); | |
| - | } | |
| - | log(c.green(`wrote ${requested.map((a) => relativeish(a.path, projectDir)).join(', ')}`)); | |
| - | return; | |
| - | } | |
| - | ||
| - | assertClean(md, decisions, 'PROMPT_TREE.md'); | |
| - | assertClean(jsonText, decisions, 'tree.json'); | |
| - | for (const artifact of Object.values(artifacts)) assertClean(artifact.text, decisions, artifact.label); | |
| - | assertClean(report, decisions, 'TREETRACE_REPORT.md'); | |
| - | ||
| - | mkdirSync(projectDir, { recursive: true }); | |
| - | mkdirSync(ttDir, { recursive: true }); | |
| - | writeFileSync(outPath, md); | |
| - | writeFileSync(reportPath, report); | |
| - | writeFileSync(join(ttDir, 'tree.json'), jsonText); | |
| - | for (const artifact of Object.values(artifacts)) writeFileSync(artifact.path, artifact.text); | |
| - | ||
| - | writeFileSync(decisionsPath, JSON.stringify(decisions, null, 2)); | |
| - | ||
| - | if (opts.json) process.stdout.write(jsonText + '\n'); | |
| - | if (opts.report) process.stdout.write(report); | |
| - | ||
| - | log(''); | |
| - | log(summaryLine(tree.stats, projectName)); | |
| - | log(renderTerminalSummary(tree, renderOpts).trimEnd()); | |
| - | previewTree(tree, log); | |
| - | log(''); | |
| - | log( | |
| - | `${c.green('ok')} wrote ${c.bold(relativeish(reportPath, projectDir))}, ${c.bold(relativeish(outPath, projectDir))}, .treetrace/tree.json, and analysis artifacts` | |
| - | ); | |
| - | if (!opts.report) log(c.dim(' run `treetrace --report` to print the human report in this terminal')); | |
| - | if (asked) log(c.dim(` ${plural(asked, 'redaction decision')} saved to .treetrace/redactions.json`)); | |
| + | return { tree, decisions, asked, sourceTool }; | |
| } | ||
| const SOURCE_TYPE_BY_TOOL = { | ||
| @@ -284,13 +315,18 @@ async function ingestFile(file, from, log) { | ||
| return { sessions: [parsePlainTranscript(readFileSync(file, 'utf8'), basename(file))], tool: 'transcript' }; | ||
| } | ||
| - | function analysisArtifacts(ttDir, tree, renderOpts) { | |
| + | function analysisArtifacts(ttDir, tree, renderOpts, projectDir) { | |
| return { | ||
| failures: { | ||
| label: 'failures.json', | ||
| path: join(ttDir, 'failures.json'), | ||
| text: JSON.stringify(renderFailuresJson(tree, renderOpts), null, 2), | ||
| }, | ||
| + | hallucinations: { | |
| + | label: 'hallucinations.json', | |
| + | path: join(ttDir, 'hallucinations.json'), | |
| + | text: JSON.stringify(renderHallucinationsJson(tree, projectDir, renderOpts), null, 2), | |
| + | }, | |
| lessons: { | ||
| label: 'lessons.md', | ||
| path: join(ttDir, 'lessons.md'), | ||
| @@ -319,7 +355,7 @@ function requestedArtifacts(opts, artifacts) { | ||
| return requested; | ||
| } | ||
| - | function assertClean(rendered, decisions, label) { | |
| + | export function assertClean(rendered, decisions, label) { | |
| const leaks = shadowScan(rendered, decisions); | ||
| if (leaks.length) { | ||
| throw new Error( | ||
| @@ -382,7 +418,7 @@ function relativeish(p, base) { | ||
| return p.startsWith(base) ? p.slice(base.length + 1) : p; | ||
| } | ||
| - | function detectProjectName(dir) { | |
| + | export function detectProjectName(dir) { | |
| try { | ||
| const pkg = JSON.parse(readFileSync(join(dir, 'package.json'), 'utf8')); | ||
| if (pkg.name) return pkg.name; | ||
| @@ -392,7 +428,7 @@ function detectProjectName(dir) { | ||
| return basename(dir); | ||
| } | ||
| - | function parseArgs(argv) { | |
| + | export function parseArgs(argv) { | |
| const opts = { | ||
| files: [], | ||
| stdin: false, | ||
| @@ -404,6 +440,8 @@ function parseArgs(argv) { | ||
| lessons: false, | ||
| evals: false, | ||
| memory: false, | ||
| + | security: false, | |
| + | mcp: false, | |
| titlesOnly: false, | ||
| redactAuto: false, | ||
| quiet: false, | ||
| @@ -430,6 +468,8 @@ function parseArgs(argv) { | ||
| case '--lessons': opts.lessons = true; break; | ||
| case '--evals': opts.evals = true; break; | ||
| case '--memory': opts.memory = true; break; | ||
| + | case '--security': opts.security = true; break; | |
| + | case 'mcp': case '--mcp': opts.mcp = true; break; | |
| case '--titles-only': opts.titlesOnly = true; break; | ||
| case '--redact-auto': opts.redactAuto = true; break; | ||
| case '--quiet': opts.quiet = true; break; |
| @@ -0,0 +1,295 @@ | ||
| + | import { readFileSync, existsSync, statSync } from 'node:fs'; | |
| + | import { isAbsolute, join, resolve, normalize } from 'node:path'; | |
| + | import { truncate } from './util.js'; | |
| + | ||
| + | const NODE_BUILTINS = new Set([ | |
| + | 'assert', 'async_hooks', 'buffer', 'child_process', 'cluster', 'console', 'constants', | |
| + | 'crypto', 'dgram', 'diagnostics_channel', 'dns', 'domain', 'events', 'fs', 'http', | |
| + | 'http2', 'https', 'inspector', 'module', 'net', 'os', 'path', 'perf_hooks', 'process', | |
| + | 'punycode', 'querystring', 'readline', 'repl', 'stream', 'string_decoder', 'sys', | |
| + | 'timers', 'tls', 'trace_events', 'tty', 'url', 'util', 'v8', 'vm', 'wasi', 'worker_threads', 'zlib', | |
| + | ]); | |
| + | ||
| + | const PY_STDLIB = new Set([ | |
| + | 'os', 'sys', 're', 'json', 'math', 'random', 'datetime', 'time', 'collections', 'itertools', | |
| + | 'functools', 'typing', 'pathlib', 'subprocess', 'logging', 'argparse', 'unittest', 'asyncio', | |
| + | 'io', 'abc', 'enum', 'dataclasses', 'copy', 'hashlib', 'base64', 'csv', 'sqlite3', 'socket', | |
| + | 'threading', 'multiprocessing', 'shutil', 'glob', 'tempfile', 'traceback', 'inspect', 'string', | |
| + | 'textwrap', 'decimal', 'fractions', 'statistics', 'struct', 'pickle', 'http', 'urllib', 'xml', | |
| + | 'html', 'email', 'warnings', 'contextlib', 'operator', 'weakref', 'gc', 'platform', 'signal', | |
| + | ]); | |
| + | ||
| + | const FILE_TOKEN_RE = /(?:[\w@./+-]*\/)?[\w@.+-]+\.[A-Za-z][A-Za-z0-9]{0,9}\b/g; | |
| + | const REL_PREFIX_RE = /^(?:\.\/|\.\.\/)/; | |
| + | const URL_LIKE_RE = /:\/\//; | |
| + | const VERSION_LIKE_RE = /^\d+(?:\.\d+)+$/; | |
| + | const JS_IMPORT_RE = | |
| + | /\b(?:import|export)\b[^;\n]*?\bfrom\s*['"]([^'"\n]+)['"]|\brequire\(\s*['"]([^'"\n]+)['"]\s*\)|\bimport\(\s*['"]([^'"\n]+)['"]\s*\)/g; | |
| + | const PY_IMPORT_RE = /^[ \t]*(?:from\s+([A-Za-z_][\w.]*)\s+import\b|import\s+([A-Za-z_][\w.]*(?:\s*,\s*[A-Za-z_][\w.]*)*))/gm; | |
| + | ||
| + | const EVIDENCE_CAP = 120; | |
| + | const MAX_TEXT_SCAN = 20000; | |
| + | ||
| + | function readPackageNames(projectDir) { | |
| + | const names = new Set(); | |
| + | const pkgPath = join(projectDir, 'package.json'); | |
| + | if (existsSync(pkgPath)) { | |
| + | try { | |
| + | const pkg = JSON.parse(readFileSync(pkgPath, 'utf8')); | |
| + | for (const field of ['dependencies', 'devDependencies', 'peerDependencies', 'optionalDependencies']) { | |
| + | if (pkg[field] && typeof pkg[field] === 'object') { | |
| + | for (const name of Object.keys(pkg[field])) names.add(name); | |
| + | } | |
| + | } | |
| + | } catch { | |
| + | ||
| + | } | |
| + | } | |
| + | return names; | |
| + | } | |
| + | ||
| + | function readLockfilePackages(projectDir) { | |
| + | const names = new Set(); | |
| + | const lockPath = join(projectDir, 'package-lock.json'); | |
| + | if (existsSync(lockPath)) { | |
| + | try { | |
| + | const lock = JSON.parse(readFileSync(lockPath, 'utf8')); | |
| + | if (lock.packages && typeof lock.packages === 'object') { | |
| + | for (const key of Object.keys(lock.packages)) { | |
| + | const idx = key.lastIndexOf('node_modules/'); | |
| + | if (idx >= 0) names.add(key.slice(idx + 'node_modules/'.length)); | |
| + | } | |
| + | } | |
| + | if (lock.dependencies && typeof lock.dependencies === 'object') { | |
| + | for (const name of Object.keys(lock.dependencies)) names.add(name); | |
| + | } | |
| + | } catch { | |
| + | ||
| + | } | |
| + | } | |
| + | return names; | |
| + | } | |
| + | ||
| + | function readPyRequirements(projectDir) { | |
| + | const names = new Set(); | |
| + | for (const file of ['requirements.txt', 'pyproject.toml', 'Pipfile']) { | |
| + | const p = join(projectDir, file); | |
| + | if (!existsSync(p)) continue; | |
| + | try { | |
| + | const text = readFileSync(p, 'utf8'); | |
| + | for (const m of text.matchAll(/^[ \t]*['"]?([A-Za-z][\w.-]+)['"]?\s*(?:[=<>~!]=?|@|\s*=\s*)/gm)) { | |
| + | names.add(m[1].toLowerCase()); | |
| + | } | |
| + | } catch { | |
| + | ||
| + | } | |
| + | } | |
| + | return names; | |
| + | } | |
| + | ||
| + | function packageRoot(spec) { | |
| + | if (spec.startsWith('@')) { | |
| + | const parts = spec.split('/'); | |
| + | return parts.slice(0, 2).join('/'); | |
| + | } | |
| + | return spec.split('/')[0]; | |
| + | } | |
| + | ||
| + | function collectCreatedFiles(tree) { | |
| + | const created = new Set(); | |
| + | for (const node of tree.nodes) { | |
| + | for (const a of node.actions || []) { | |
| + | if (!a.file || typeof a.file !== 'string') continue; | |
| + | if (a.tool === 'Write' || a.tool === 'Edit' || a.tool === 'NotebookEdit') { | |
| + | created.add(normalizeFileKey(a.file)); | |
| + | } | |
| + | } | |
| + | } | |
| + | return created; | |
| + | } | |
| + | ||
| + | function normalizeFileKey(p) { | |
| + | return p.replace(/^\.?\//, '').replace(/\\/g, '/').toLowerCase(); | |
| + | } | |
| + | ||
| + | function looksLikeFileToken(tok) { | |
| + | if (tok.length < 3 || tok.length > 200) return false; | |
| + | if (URL_LIKE_RE.test(tok)) return false; | |
| + | if (VERSION_LIKE_RE.test(tok)) return false; | |
| + | const ext = tok.slice(tok.lastIndexOf('.') + 1).toLowerCase(); | |
| + | if (!ext || ext.length > 10) return false; | |
| + | return true; | |
| + | } | |
| + | ||
| + | function fileExists(projectDir, rel) { | |
| + | const clean = rel.replace(/^\.\//, ''); | |
| + | let target; | |
| + | if (isAbsolute(clean)) { | |
| + | target = clean; | |
| + | } else { | |
| + | target = resolve(projectDir, clean); | |
| + | } | |
| + | try { | |
| + | if (existsSync(target)) return true; | |
| + | } catch { | |
| + | ||
| + | } | |
| + | const base = clean.split('/').pop(); | |
| + | return globByBasename(projectDir, base, target); | |
| + | } | |
| + | ||
| + | function globByBasename(projectDir, base, fullCandidate) { | |
| + | try { | |
| + | const direct = join(projectDir, base); | |
| + | if (existsSync(direct) && statSync(direct).isFile()) return true; | |
| + | } catch { | |
| + | ||
| + | } | |
| + | return false; | |
| + | } | |
| + | ||
| + | function collectFileReferences(tree) { | |
| + | const refs = []; | |
| + | const seen = new Set(); | |
| + | const push = (raw, nodeId) => { | |
| + | const tok = raw.trim().replace(/^['"`(]+|['"`),.;:]+$/g, ''); | |
| + | if (!looksLikeFileToken(tok)) return; | |
| + | const key = normalizeFileKey(tok); | |
| + | if (seen.has(key)) return; | |
| + | seen.add(key); | |
| + | refs.push({ token: tok, key, nodeId }); | |
| + | }; | |
| + | for (const node of tree.nodes) { | |
| + | if (node.status === 'abandoned') continue; | |
| + | const text = String(node.text || '').slice(0, MAX_TEXT_SCAN); | |
| + | for (const m of text.matchAll(FILE_TOKEN_RE)) push(m[0], node.id); | |
| + | for (const a of node.actions || []) { | |
| + | const body = `${a.input || ''}`.slice(0, MAX_TEXT_SCAN); | |
| + | for (const m of body.matchAll(FILE_TOKEN_RE)) push(m[0], node.id); | |
| + | } | |
| + | } | |
| + | return refs; | |
| + | } | |
| + | ||
| + | function collectImportReferences(tree) { | |
| + | const refs = []; | |
| + | const seen = new Set(); | |
| + | const push = (spec, lang, nodeId) => { | |
| + | if (!spec) return; | |
| + | const root = packageRoot(spec); | |
| + | if (!root) return; | |
| + | const key = `${lang}:${root}`; | |
| + | if (seen.has(key)) return; | |
| + | seen.add(key); | |
| + | refs.push({ spec: root, lang, nodeId }); | |
| + | }; | |
| + | for (const node of tree.nodes) { | |
| + | if (node.status === 'abandoned') continue; | |
| + | const sources = [String(node.text || '')]; | |
| + | for (const a of node.actions || []) { | |
| + | if (a.input) sources.push(String(a.input)); | |
| + | if (a.command) sources.push(String(a.command)); | |
| + | } | |
| + | for (const src of sources) { | |
| + | const text = src.slice(0, MAX_TEXT_SCAN); | |
| + | for (const m of text.matchAll(JS_IMPORT_RE)) push(m[1] || m[2] || m[3], 'js', node.id); | |
| + | for (const m of text.matchAll(PY_IMPORT_RE)) { | |
| + | if (m[1]) push(m[1], 'py', node.id); | |
| + | if (m[2]) for (const piece of m[2].split(',')) push(piece.trim(), 'py', node.id); | |
| + | } | |
| + | } | |
| + | } | |
| + | return refs; | |
| + | } | |
| + | ||
| + | function isRelativeOrLocalSpec(spec) { | |
| + | return REL_PREFIX_RE.test(spec) || spec.startsWith('/') || spec.startsWith('node:'); | |
| + | } | |
| + | ||
| + | export function detectHallucinations(tree, projectDir, opts = {}) { | |
| + | const hallucinations = []; | |
| + | if (!projectDir || !existsSync(projectDir)) { | |
| + | return { schemaVersion: '0.2', verifiedAgainstWorkingTree: false, hallucinations, summary: emptySummary() }; | |
| + | } | |
| + | ||
| + | const created = collectCreatedFiles(tree); | |
| + | const pkgNames = readPackageNames(projectDir); | |
| + | const lockNames = readLockfilePackages(projectDir); | |
| + | const pyNames = readPyRequirements(projectDir); | |
| + | const hasManifest = pkgNames.size > 0 || lockNames.size > 0 || pyNames.size > 0; | |
| + | ||
| + | for (const ref of collectFileReferences(tree)) { | |
| + | if (created.has(ref.key)) continue; | |
| + | if (REL_PREFIX_RE.test(ref.token)) continue; | |
| + | if (fileExists(projectDir, ref.token)) continue; | |
| + | hallucinations.push({ | |
| + | category: 'hallucinated_file_or_path', | |
| + | reference: truncate(ref.token, EVIDENCE_CAP), | |
| + | nodeId: ref.nodeId, | |
| + | evidence: `Referenced "${truncate(ref.token, EVIDENCE_CAP)}" which does not exist in the working tree and was not created during the session.`, | |
| + | evalCandidate: { | |
| + | type: 'reference_existence_check', | |
| + | task: 'Verify a file or path exists in the working tree before editing or relying on it.', | |
| + | target: truncate(ref.token, EVIDENCE_CAP), | |
| + | }, | |
| + | }); | |
| + | } | |
| + | ||
| + | for (const ref of collectImportReferences(tree)) { | |
| + | if (isRelativeOrLocalSpec(ref.spec)) continue; | |
| + | if (ref.lang === 'js') { | |
| + | if (NODE_BUILTINS.has(ref.spec) || NODE_BUILTINS.has(ref.spec.replace(/^node:/, ''))) continue; | |
| + | if (pkgNames.has(ref.spec) || lockNames.has(ref.spec)) continue; | |
| + | if (!hasManifest) continue; | |
| + | } else { | |
| + | if (PY_STDLIB.has(ref.spec)) continue; | |
| + | if (pyNames.has(ref.spec.toLowerCase())) continue; | |
| + | if (pyNames.size === 0) continue; | |
| + | } | |
| + | hallucinations.push({ | |
| + | category: 'hallucinated_import_or_package', | |
| + | reference: truncate(ref.spec, EVIDENCE_CAP), | |
| + | nodeId: ref.nodeId, | |
| + | evidence: `Imported "${truncate(ref.spec, EVIDENCE_CAP)}" (${ref.lang}) which is not a declared dependency or a standard-library module.`, | |
| + | evalCandidate: { | |
| + | type: 'import_existence_check', | |
| + | task: 'Verify an import or package is declared as a dependency before relying on it.', | |
| + | target: truncate(ref.spec, EVIDENCE_CAP), | |
| + | }, | |
| + | }); | |
| + | } | |
| + | ||
| + | return { | |
| + | schemaVersion: '0.2', | |
| + | verifiedAgainstWorkingTree: true, | |
| + | manifestSeen: hasManifest, | |
| + | hallucinations, | |
| + | summary: summarize(hallucinations), | |
| + | }; | |
| + | } | |
| + | ||
| + | function emptySummary() { | |
| + | return { total: 0, byCategory: { hallucinated_file_or_path: 0, hallucinated_import_or_package: 0 } }; | |
| + | } | |
| + | ||
| + | function summarize(hallucinations) { | |
| + | const summary = emptySummary(); | |
| + | summary.total = hallucinations.length; | |
| + | for (const h of hallucinations) { | |
| + | if (summary.byCategory[h.category] !== undefined) summary.byCategory[h.category]++; | |
| + | } | |
| + | return summary; | |
| + | } | |
| + | ||
| + | export function renderHallucinationsJson(tree, projectDir, opts = {}) { | |
| + | const result = detectHallucinations(tree, projectDir, opts); | |
| + | return { | |
| + | schemaVersion: '0.2', | |
| + | project: { name: opts.projectName || null, generatedAt: opts.generatedAt || null }, | |
| + | verifiedAgainstWorkingTree: result.verifiedAgainstWorkingTree, | |
| + | manifestSeen: result.manifestSeen || false, | |
| + | summary: result.summary, | |
| + | hallucinations: result.hallucinations, | |
| + | note: 'File and path existence and import and package declaration are checked deterministically against the working tree and manifests. Per-symbol and per-API resolution inside a module is not attempted.', | |
| + | }; | |
| + | } |
| @@ -0,0 +1,143 @@ | ||
| + | import { createInterface } from 'node:readline'; | |
| + | import { resolve } from 'node:path'; | |
| + | import { parseArgs, loadRedactedTree, detectProjectName, assertClean } from './cli.js'; | |
| + | import { renderHandoff } from './handoff.js'; | |
| + | import { renderLessonsMarkdown, analyzeTree } from './analyze.js'; | |
| + | import { renderSecurityReport } from './security-report.js'; | |
| + | import { renderHallucinationsJson } from './hallucinate.js'; | |
| + | ||
| + | const PROTOCOL_VERSION = '2024-11-05'; | |
| + | ||
| + | const TOOL_DEFS = [ | |
| + | { | |
| + | name: 'handoff', | |
| + | description: 'Continuation brief for the next agent: goal, accepted decisions, constraints, and dead ends. Read only.', | |
| + | inputSchema: { type: 'object', properties: {}, additionalProperties: false }, | |
| + | }, | |
| + | { | |
| + | name: 'lessons', | |
| + | description: 'Accepted constraints and repeated corrections distilled from the session lineage. Read only.', | |
| + | inputSchema: { type: 'object', properties: {}, additionalProperties: false }, | |
| + | }, | |
| + | { | |
| + | name: 'security_summary', | |
| + | description: 'Evidence-backed security-sensitive touches, test changes, risky commands, and hallucinated references. Read only.', | |
| + | inputSchema: { type: 'object', properties: {}, additionalProperties: false }, | |
| + | }, | |
| + | { | |
| + | name: 'eval_candidates', | |
| + | description: 'Compact regression cases derived from session corrections and hallucinated references. Read only.', | |
| + | inputSchema: { type: 'object', properties: {}, additionalProperties: false }, | |
| + | }, | |
| + | ]; | |
| + | ||
| + | export async function startMcpServer({ argv, version }, io = {}) { | |
| + | const input = io.input || process.stdin; | |
| + | const output = io.output || process.stdout; | |
| + | const opts = parseArgs((argv || []).filter((a) => a !== 'mcp' && a !== '--mcp')); | |
| + | const projectDir = resolve(opts.dir || process.cwd()); | |
| + | const projectName = detectProjectName(projectDir); | |
| + | ||
| + | let cache = null; | |
| + | const ensureTree = async () => { | |
| + | if (cache) return cache; | |
| + | const { tree, decisions } = await loadRedactedTree(opts, projectDir, projectName, () => {}, { forceAuto: true }); | |
| + | cache = { tree, decisions, renderOpts: { projectName, version, projectDir, generatedAt: new Date().toISOString() } }; | |
| + | return cache; | |
| + | }; | |
| + | ||
| + | return new Promise((resolveServer) => { | |
| + | const rl = createInterface({ input, crlfDelay: Infinity }); | |
| + | const send = (msg) => output.write(`${JSON.stringify(msg)}\n`); | |
| + | ||
| + | rl.on('line', async (line) => { | |
| + | const text = line.trim(); | |
| + | if (!text) return; | |
| + | let req; | |
| + | try { | |
| + | req = JSON.parse(text); | |
| + | } catch { | |
| + | send({ jsonrpc: '2.0', id: null, error: { code: -32700, message: 'Parse error' } }); | |
| + | return; | |
| + | } | |
| + | try { | |
| + | await handle(req, send, ensureTree, version); | |
| + | } catch (err) { | |
| + | send({ | |
| + | jsonrpc: '2.0', | |
| + | id: req && req.id !== undefined ? req.id : null, | |
| + | error: { code: -32603, message: `Internal error: ${err && err.message ? err.message : 'unknown'}` }, | |
| + | }); | |
| + | } | |
| + | }); | |
| + | rl.on('close', () => resolveServer()); | |
| + | }); | |
| + | } | |
| + | ||
| + | async function handle(req, send, ensureTree, version) { | |
| + | if (!req || req.jsonrpc !== '2.0' || typeof req.method !== 'string') { | |
| + | send({ jsonrpc: '2.0', id: req && req.id !== undefined ? req.id : null, error: { code: -32600, message: 'Invalid Request' } }); | |
| + | return; | |
| + | } | |
| + | const isNotification = req.id === undefined || req.id === null; | |
| + | const reply = (result) => { if (!isNotification) send({ jsonrpc: '2.0', id: req.id, result }); }; | |
| + | const fail = (code, message) => { if (!isNotification) send({ jsonrpc: '2.0', id: req.id, error: { code, message } }); }; | |
| + | ||
| + | switch (req.method) { | |
| + | case 'initialize': | |
| + | reply({ | |
| + | protocolVersion: PROTOCOL_VERSION, | |
| + | capabilities: { tools: {} }, | |
| + | serverInfo: { name: 'treetrace', version: version || '0.0.0' }, | |
| + | }); | |
| + | return; | |
| + | case 'notifications/initialized': | |
| + | case 'initialized': | |
| + | return; | |
| + | case 'ping': | |
| + | reply({}); | |
| + | return; | |
| + | case 'tools/list': | |
| + | reply({ tools: TOOL_DEFS }); | |
| + | return; | |
| + | case 'tools/call': { | |
| + | const params = req.params || {}; | |
| + | const name = params.name; | |
| + | const def = TOOL_DEFS.find((t) => t.name === name); | |
| + | if (!def) { | |
| + | fail(-32602, `Unknown tool: ${name}`); | |
| + | return; | |
| + | } | |
| + | const { tree, decisions, renderOpts } = await ensureTree(); | |
| + | const text = renderTool(name, tree, renderOpts); | |
| + | assertClean(text, decisions, `mcp tool ${name}`); | |
| + | reply({ content: [{ type: 'text', text }], isError: false }); | |
| + | return; | |
| + | } | |
| + | default: | |
| + | fail(-32601, `Method not found: ${req.method}`); | |
| + | } | |
| + | } | |
| + | ||
| + | function renderTool(name, tree, renderOpts) { | |
| + | switch (name) { | |
| + | case 'handoff': | |
| + | return renderHandoff(tree, renderOpts); | |
| + | case 'lessons': | |
| + | return renderLessonsMarkdown(tree, renderOpts); | |
| + | case 'security_summary': | |
| + | return renderSecurityReport(tree, renderOpts.projectDir || null, renderOpts); | |
| + | case 'eval_candidates': { | |
| + | const analysis = analyzeTree(tree); | |
| + | const hall = renderHallucinationsJson(tree, renderOpts.projectDir || null, renderOpts); | |
| + | const payload = { | |
| + | schemaVersion: '0.2', | |
| + | evalCandidates: analysis.evalCandidates, | |
| + | hallucinationEvalCandidates: hall.hallucinations.map((h) => h.evalCandidate), | |
| + | }; | |
| + | return JSON.stringify(payload, null, 2); | |
| + | } | |
| + | default: | |
| + | return ''; | |
| + | } | |
| + | } |
| @@ -65,6 +65,7 @@ export function renderReportMarkdown(tree, opts = {}) { | ||
| lines.push('| `PROMPT_TREE.md` | Full lineage narrative and replayable prompt pack. |'); | ||
| lines.push('| `.treetrace/tree.json` | Canonical schema for tools and integrations. |'); | ||
| lines.push('| `.treetrace/failures.json` | Failure labels, evidence, correction chains. |'); | ||
| + | lines.push('| `.treetrace/hallucinations.json` | Referenced files, paths, imports, or packages that do not exist in the working tree. |'); | |
| lines.push('| `.treetrace/lessons.md` | Human-readable lessons. |'); | ||
| lines.push('| `.treetrace/evals.jsonl` | Eval/regression cases; not meant to be pretty. |'); | ||
| lines.push('| `.treetrace/agent-memory.md` | Short memory pack for Codex, Claude Code, Cursor, or another agent. |'); |
| @@ -0,0 +1,220 @@ | ||
| + | import { truncate, escapeMd } from './util.js'; | |
| + | import { analyzeTree, classifySecuritySurface, isRiskyCommand, mentionsTestSkip } from './analyze.js'; | |
| + | import { detectHallucinations } from './hallucinate.js'; | |
| + | import { REPO_URL } from './config.js'; | |
| + | ||
| + | const SURFACE_LABELS = { | |
| + | auth: 'auth', | |
| + | secrets: 'secrets', | |
| + | 'access-control': 'access control', | |
| + | crypto: 'crypto', | |
| + | 'dependency-config': 'dependency config', | |
| + | ci: 'CI', | |
| + | deployment: 'deployment', | |
| + | tests: 'tests', | |
| + | }; | |
| + | const SURFACE_ORDER = ['auth', 'secrets', 'access-control', 'crypto', 'dependency-config', 'ci', 'deployment', 'tests']; | |
| + | const EVIDENCE_CAP = 200; | |
| + | const tierRank = { verified: 4, high: 3, confirmed: 2, inferred: 1 }; | |
| + | ||
| + | function collectSurfaceTouches(tree) { | |
| + | const bySurface = new Map(); | |
| + | for (const node of tree.nodes) { | |
| + | if (node.status === 'abandoned') continue; | |
| + | for (const a of node.actions || []) { | |
| + | const surface = classifySecuritySurface(a.file); | |
| + | if (!surface) continue; | |
| + | if (!bySurface.has(surface)) bySurface.set(surface, []); | |
| + | bySurface.get(surface).push({ file: a.file, nodeId: node.id, model: a.model || node.model || null }); | |
| + | } | |
| + | } | |
| + | return bySurface; | |
| + | } | |
| + | ||
| + | function collectTestSkips(tree) { | |
| + | const out = []; | |
| + | for (const node of tree.nodes) { | |
| + | if (node.status === 'abandoned') continue; | |
| + | if (mentionsTestSkip(node.text)) { | |
| + | out.push({ nodeId: node.id, evidence: truncate(node.text, EVIDENCE_CAP) }); | |
| + | continue; | |
| + | } | |
| + | for (const a of node.actions || []) { | |
| + | const body = a.input || a.command || ''; | |
| + | if (mentionsTestSkip(body)) { | |
| + | out.push({ nodeId: node.id, evidence: truncate(body, EVIDENCE_CAP) }); | |
| + | break; | |
| + | } | |
| + | } | |
| + | } | |
| + | return out; | |
| + | } | |
| + | ||
| + | function collectRiskyCommands(tree) { | |
| + | const out = []; | |
| + | for (const node of tree.nodes) { | |
| + | if (node.status === 'abandoned') continue; | |
| + | for (const a of node.actions || []) { | |
| + | if (isRiskyCommand(a.command)) { | |
| + | out.push({ nodeId: node.id, command: truncate(a.command, EVIDENCE_CAP), model: a.model || node.model || null }); | |
| + | } | |
| + | } | |
| + | } | |
| + | return out; | |
| + | } | |
| + | ||
| + | function collectCorrections(tree) { | |
| + | return tree.nodes.filter((n) => n.status !== 'abandoned' && n.kind === 'correction'); | |
| + | } | |
| + | ||
| + | export function buildSecurityFindings(tree, projectDir, opts = {}) { | |
| + | const analysis = analyzeTree(tree); | |
| + | const surfaces = collectSurfaceTouches(tree); | |
| + | const testSkips = collectTestSkips(tree); | |
| + | const riskyCommands = collectRiskyCommands(tree); | |
| + | const hallucinationResult = projectDir | |
| + | ? detectHallucinations(tree, projectDir, opts) | |
| + | : { hallucinations: [], verifiedAgainstWorkingTree: false }; | |
| + | const securitySignals = analysis.failures | |
| + | .filter((f) => f.type === 'security_or_privacy_risk') | |
| + | .sort((a, b) => (tierRank[b.tier] || 0) - (tierRank[a.tier] || 0)); | |
| + | const corrections = collectCorrections(tree); | |
| + | ||
| + | return { analysis, surfaces, testSkips, riskyCommands, hallucinationResult, securitySignals, corrections }; | |
| + | } | |
| + | ||
| + | export function hasSecuritySignal(tree, projectDir, opts = {}) { | |
| + | const f = buildSecurityFindings(tree, projectDir, opts); | |
| + | return ( | |
| + | f.surfaces.size > 0 || | |
| + | f.testSkips.length > 0 || | |
| + | f.riskyCommands.length > 0 || | |
| + | f.securitySignals.length > 0 || | |
| + | f.hallucinationResult.hallucinations.length > 0 | |
| + | ); | |
| + | } | |
| + | ||
| + | export function renderSecurityReport(tree, projectDir, opts = {}) { | |
| + | const projectName = opts.projectName || 'project'; | |
| + | const generatedAt = opts.generatedAt || new Date().toISOString(); | |
| + | const f = buildSecurityFindings(tree, projectDir, opts); | |
| + | const lines = []; | |
| + | ||
| + | lines.push(`# TreeTrace Security Report - ${escapeMd(projectName)}`); | |
| + | lines.push(''); | |
| + | lines.push(`Generated: ${generatedAt}`); | |
| + | lines.push(''); | |
| + | lines.push( | |
| + | 'This report leads with concrete failure classes from the session. It reuses the same signals as the full TreeTrace analysis; it does not run a separate scanner.' | |
| + | ); | |
| + | lines.push(''); | |
| + | ||
| + | const anySignal = | |
| + | f.surfaces.size || f.testSkips.length || f.riskyCommands.length || f.securitySignals.length || f.hallucinationResult.hallucinations.length; | |
| + | if (!anySignal) { | |
| + | lines.push('No security-sensitive touches, test changes, risky commands, hallucinated references, or stated security intents were detected in this session.'); | |
| + | lines.push(''); | |
| + | footer(lines, opts); | |
| + | return lines.join('\n'); | |
| + | } | |
| + | ||
| + | lines.push('## 1. Did the agent touch security-sensitive surfaces?'); | |
| + | lines.push(''); | |
| + | if (f.surfaces.size) { | |
| + | lines.push('Yes. Touched surfaces, with the files involved:'); | |
| + | lines.push(''); | |
| + | for (const surface of SURFACE_ORDER) { | |
| + | const touches = f.surfaces.get(surface); | |
| + | if (!touches || !touches.length) continue; | |
| + | const files = [...new Set(touches.map((t) => t.file))].slice(0, 8); | |
| + | lines.push(`- ${SURFACE_LABELS[surface]}: ${files.map((x) => `\`${escapeMd(truncate(x, 100))}\``).join(', ')}`); | |
| + | } | |
| + | } else { | |
| + | lines.push('No edits to auth, secrets, access control, crypto, dependency config, CI, deployment, or test files were observed in the captured actions.'); | |
| + | } | |
| + | if (f.securitySignals.length) { | |
| + | lines.push(''); | |
| + | lines.push('Security signals from the analysis pass (highest tier first):'); | |
| + | lines.push(''); | |
| + | for (const s of f.securitySignals.slice(0, 12)) { | |
| + | const tag = s.tier === 'inferred' ? 'stated intent' : s.tier; | |
| + | lines.push(`- (${tag}) ${escapeMd(truncate(s.evidence, EVIDENCE_CAP))}${s.model ? ` (${s.model})` : ''}`); | |
| + | } | |
| + | } | |
| + | lines.push(''); | |
| + | ||
| + | lines.push('## 2. Did the agent disable or skip tests?'); | |
| + | lines.push(''); | |
| + | if (f.testSkips.length) { | |
| + | lines.push('Possible test removal or skipping was detected. Verify before trusting the suite:'); | |
| + | lines.push(''); | |
| + | for (const t of f.testSkips.slice(0, 8)) lines.push(`- (${t.nodeId}) ${escapeMd(t.evidence)}`); | |
| + | } else { | |
| + | lines.push('No evidence of disabled or skipped tests was found in prompts or captured actions.'); | |
| + | } | |
| + | lines.push(''); | |
| + | ||
| + | lines.push('## 3. Did the agent run risky shell commands?'); | |
| + | lines.push(''); | |
| + | if (f.riskyCommands.length) { | |
| + | lines.push('Yes. The following commands matched the risky-command patterns:'); | |
| + | lines.push(''); | |
| + | for (const r of f.riskyCommands.slice(0, 8)) lines.push(`- (${r.nodeId}) \`${escapeMd(r.command)}\`${r.model ? ` (${r.model})` : ''}`); | |
| + | } else { | |
| + | lines.push('No commands matched the risky-shell patterns (force pushes without review, recursive deletes, piped remote shells, world-writable chmod, destructive SQL).'); | |
| + | } | |
| + | lines.push(''); | |
| + | ||
| + | lines.push('## 4. Did the agent reference files, paths, imports, or packages that do not exist?'); | |
| + | lines.push(''); | |
| + | if (!f.hallucinationResult.verifiedAgainstWorkingTree) { | |
| + | lines.push('Not checked: no readable working tree was available for verification.'); | |
| + | } else if (f.hallucinationResult.hallucinations.length) { | |
| + | lines.push('Yes. The following references could not be verified against the working tree or declared dependencies:'); | |
| + | lines.push(''); | |
| + | for (const h of f.hallucinationResult.hallucinations.slice(0, 12)) { | |
| + | lines.push(`- (${h.category}) ${escapeMd(truncate(h.evidence, EVIDENCE_CAP))}`); | |
| + | } | |
| + | lines.push(''); | |
| + | lines.push('File and path existence and import and package declaration are checked deterministically. Per-symbol or per-API resolution inside a module is not attempted.'); | |
| + | } else { | |
| + | lines.push('No hallucinated files, paths, imports, or packages were detected. File and path existence and import and package declaration were checked against the working tree and manifests.'); | |
| + | } | |
| + | lines.push(''); | |
| + | ||
| + | lines.push('## 5. What human correction should become a future eval or memory item?'); | |
| + | lines.push(''); | |
| + | const securityChains = f.analysis.correctionChains.filter((c) => c.failureType === 'security_or_privacy_risk'); | |
| + | if (securityChains.length || f.corrections.length) { | |
| + | lines.push('Turn these corrections into regression evals so the next agent inherits the constraint:'); | |
| + | lines.push(''); | |
| + | const seen = new Set(); | |
| + | for (const chain of securityChains.slice(0, 6)) { | |
| + | const corr = tree.nodes.find((n) => n.id === chain.correctionNodeId); | |
| + | if (corr && !seen.has(corr.id)) { | |
| + | seen.add(corr.id); | |
| + | lines.push(`- (security correction) ${escapeMd(truncate(corr.text.replace(/\s+/g, ' '), 300))}`); | |
| + | } | |
| + | } | |
| + | for (const corr of f.corrections.slice(-6)) { | |
| + | if (seen.has(corr.id)) continue; | |
| + | seen.add(corr.id); | |
| + | lines.push(`- ${escapeMd(truncate(corr.text.replace(/\s+/g, ' '), 300))}`); | |
| + | } | |
| + | lines.push(''); | |
| + | lines.push('Eval candidates from the analysis pass live in `.treetrace/evals.jsonl`; hallucination eval candidates live in `.treetrace/hallucinations.json`.'); | |
| + | } else { | |
| + | lines.push('No human correction was linked to a security-sensitive action in this session. If a security touch above was intentional, capture the rationale so the next agent does not flag it again.'); | |
| + | } | |
| + | lines.push(''); | |
| + | ||
| + | footer(lines, opts); | |
| + | return lines.join('\n'); | |
| + | } | |
| + | ||
| + | function footer(lines, opts) { | |
| + | lines.push('---'); | |
| + | lines.push(''); | |
| + | lines.push(`Generated by [treetrace](${REPO_URL})${opts.version ? ` v${opts.version}` : ''}.`); | |
| + | lines.push(''); | |
| + | } |
| @@ -1,6 +1,6 @@ | ||
| import { test } from 'node:test'; | ||
| import assert from 'node:assert/strict'; | ||
| - | import { existsSync, mkdtempSync, readFileSync, rmSync, writeFileSync } from 'node:fs'; | |
| + | import { existsSync, mkdirSync, mkdtempSync, readFileSync, rmSync, writeFileSync } from 'node:fs'; | |
| import { tmpdir } from 'node:os'; | ||
| import { fileURLToPath } from 'node:url'; | ||
| import { dirname, join } from 'node:path'; | ||
| @@ -23,6 +23,9 @@ import { | ||
| import { main } from '../src/cli.js'; | ||
| import { mungePath } from '../src/discover.js'; | ||
| import { sha256, escapeMd } from '../src/util.js'; | ||
| + | import { detectHallucinations, renderHallucinationsJson } from '../src/hallucinate.js'; | |
| + | import { renderSecurityReport, hasSecuritySignal } from '../src/security-report.js'; | |
| + | import { spawn } from 'node:child_process'; | |
| const FIXTURE = join(dirname(fileURLToPath(import.meta.url)), 'fixtures', 'synthetic-session.jsonl'); | ||
| @@ -742,3 +745,182 @@ test('discover: path munging matches Claude Code storage layout', () => { | ||
| assert.equal(mungePath('/home/dev/weatherapp/api'), '-home-dev-weatherapp-api'); | ||
| assert.equal(mungePath('/home/u.ser/my_app'), '-home-u-ser-my-app'); | ||
| }); | ||
| + | ||
| + | function tempProject() { | |
| + | const dir = mkdtempSync(join(tmpdir(), 'treetrace-feat-')); | |
| + | writeFileSync(join(dir, 'package.json'), JSON.stringify({ name: 'demo', dependencies: { express: '^4.0.0' } })); | |
| + | mkdirSync(join(dir, 'src'), { recursive: true }); | |
| + | writeFileSync(join(dir, 'src', 'real.js'), 'export const real = 1;\n'); | |
| + | return dir; | |
| + | } | |
| + | ||
| + | test('hallucinations: flags only the invented file and import, not the real ones', () => { | |
| + | const dir = tempProject(); | |
| + | try { | |
| + | const root = { | |
| + | id: 'node_001', kind: 'root', status: 'accepted', parent: null, | |
| + | text: 'Open src/real.js and src/imaginary.js to wire the feature.', | |
| + | title: 'wire the feature', | |
| + | actions: [{ | |
| + | tool: 'Edit', file: 'src/real.js', | |
| + | input: "import express from 'express';\nimport ghostlib from 'ghostlib-does-not-exist';\nimport { readFileSync } from 'node:fs';", | |
| + | command: null, model: 'm', | |
| + | }], | |
| + | }; | |
| + | const tree = { nodes: [root] }; | |
| + | const result = detectHallucinations(tree, dir); | |
| + | const files = result.hallucinations.filter((h) => h.category === 'hallucinated_file_or_path').map((h) => h.reference); | |
| + | const imports = result.hallucinations.filter((h) => h.category === 'hallucinated_import_or_package').map((h) => h.reference); | |
| + | ||
| + | assert.ok(files.includes('src/imaginary.js'), `invented file should be flagged (got ${files})`); | |
| + | assert.ok(!files.includes('src/real.js'), 'the real file must not be flagged'); | |
| + | assert.ok(!files.some((f) => /package\.json/.test(f)), 'the real package.json must not be flagged'); | |
| + | ||
| + | assert.ok(imports.includes('ghostlib-does-not-exist'), `invented import should be flagged (got ${imports})`); | |
| + | assert.ok(!imports.includes('express'), 'a declared dependency must not be flagged'); | |
| + | assert.ok(!imports.includes('fs') && !imports.includes('node:fs'), 'a node builtin must not be flagged'); | |
| + | ||
| + | for (const h of result.hallucinations) { | |
| + | assert.ok(h.evalCandidate && h.evalCandidate.target, 'each hallucination should carry an eval candidate'); | |
| + | } | |
| + | } finally { | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('hallucinations: a file created during the session is not flagged', () => { | |
| + | const dir = tempProject(); | |
| + | try { | |
| + | const root = { | |
| + | id: 'node_001', kind: 'root', status: 'accepted', parent: null, | |
| + | text: 'Create src/brandnew.js and then reference src/brandnew.js again.', | |
| + | title: 'create new file', | |
| + | actions: [{ tool: 'Write', file: 'src/brandnew.js', input: 'export const n = 1;', command: null, model: 'm' }], | |
| + | }; | |
| + | const result = detectHallucinations({ nodes: [root] }, dir); | |
| + | const files = result.hallucinations.filter((h) => h.category === 'hallucinated_file_or_path').map((h) => h.reference); | |
| + | assert.ok(!files.includes('src/brandnew.js'), 'a file the agent created this session must not be flagged'); | |
| + | } finally { | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('security report: surfaces real signals and omits benign sessions', () => { | |
| + | const dir = tempProject(); | |
| + | try { | |
| + | const root = { | |
| + | id: 'node_001', kind: 'root', status: 'accepted', parent: null, | |
| + | text: 'harden the login flow', title: 'harden the login flow', | |
| + | actions: [ | |
| + | { tool: 'Edit', file: 'src/auth/login.js', input: 'export function login() {}', command: null, model: 'claude-opus-4-8' }, | |
| + | { tool: 'Bash', file: null, command: 'rm -rf build', input: 'rm -rf build', model: 'claude-opus-4-8' }, | |
| + | ], | |
| + | }; | |
| + | const correction = { | |
| + | id: 'node_002', kind: 'correction', status: 'accepted', parent: root, | |
| + | text: 'no, do not disable the tests in the auth suite, keep them running', | |
| + | title: 'do not disable tests', actions: [], | |
| + | }; | |
| + | const tree = { nodes: [root, correction] }; | |
| + | assert.ok(hasSecuritySignal(tree, dir), 'expected a security signal for the auth edit'); | |
| + | const report = renderSecurityReport(tree, dir, { projectName: 'demo', generatedAt: '2026-01-01T00:00:00.000Z' }); | |
| + | ||
| + | assert.ok(report.startsWith('# TreeTrace Security Report - demo')); | |
| + | assert.ok(/auth: .*src\/auth\/login\.js/.test(report), 'auth surface and file should be listed'); | |
| + | assert.ok(/rm -rf build/.test(report), 'risky command should be listed'); | |
| + | assert.ok(/disable the tests|disable or skip tests/i.test(report), 'test-skip signal should appear'); | |
| + | assert.ok(/do not disable the tests/i.test(report), 'the human correction should surface as an eval/memory candidate'); | |
| + | ||
| + | const benign = { | |
| + | id: 'node_001', kind: 'root', status: 'accepted', parent: null, | |
| + | text: 'add a markdown table to the README', title: 'add a table', | |
| + | actions: [{ tool: 'Edit', file: 'README.md', input: '| a | b |', command: null, model: 'm' }], | |
| + | }; | |
| + | const benignTree = { nodes: [benign] }; | |
| + | assert.ok(!hasSecuritySignal(benignTree, dir), 'benign session should have no security signal'); | |
| + | const benignReport = renderSecurityReport(benignTree, dir, { projectName: 'demo', generatedAt: '2026-01-01T00:00:00.000Z' }); | |
| + | assert.ok(/No security-sensitive touches/.test(benignReport), 'benign report should state nothing was found'); | |
| + | } finally { | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('security report and hallucinations.json do not leak injected secrets via the CLI', async () => { | |
| + | const dir = tempProject(); | |
| + | const hex = '6881f8290266f4cc939959917f893a2a88787eb24bbcb6b9c37594c72bf448c3'; | |
| + | const ghToken = 'ghp_0123456789abcdefghijklmnopqrstuvwxyzAB'; | |
| + | const convo = [{ | |
| + | mapping: { | |
| + | r: { message: null, parent: null, children: ['u'] }, | |
| + | u: { message: { author: { role: 'user' }, content: { parts: [ | |
| + | `edit src/imaginary.js, my key is session_hex=${hex} and token ${ghToken}`, | |
| + | ] }, create_time: 1.0 }, parent: 'r', children: ['a'] }, | |
| + | a: { message: { author: { role: 'assistant' }, content: { parts: ['ok'] }, create_time: 2.0 }, parent: 'u', children: [] }, | |
| + | }, | |
| + | }]; | |
| + | const file = join(dir, 'leaky.json'); | |
| + | writeFileSync(file, JSON.stringify(convo)); | |
| + | try { | |
| + | await main(['--from', 'chatgpt', '--file', file, '--dir', dir, '--security', '--redact-auto', '--quiet']); | |
| + | const hall = readFileSync(join(dir, '.treetrace/hallucinations.json'), 'utf8'); | |
| + | assert.ok(!hall.includes(hex), 'hex secret leaked into hallucinations.json'); | |
| + | assert.ok(!hall.includes(ghToken), 'github token leaked into hallucinations.json'); | |
| + | assert.ok(/imaginary\.js/.test(hall), 'the invented file should still be detected'); | |
| + | } finally { | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('mcp: initialize, tools/list, and tools/call return well-formed JSON-RPC', async () => { | |
| + | const dir = tempProject(); | |
| + | const convo = [{ | |
| + | mapping: { | |
| + | r: { message: null, parent: null, children: ['u'] }, | |
| + | u: { message: { author: { role: 'user' }, content: { parts: ['build a cli and do not add dependencies'] }, create_time: 1.0 }, parent: 'r', children: ['a'] }, | |
| + | a: { message: { author: { role: 'assistant' }, content: { parts: ['ok'] }, create_time: 2.0 }, parent: 'u', children: ['u2'] }, | |
| + | u2: { message: { author: { role: 'user' }, content: { parts: ['no, that is wrong, keep it minimal'] }, create_time: 3.0 }, parent: 'a', children: [] }, | |
| + | }, | |
| + | }]; | |
| + | const file = join(dir, 'mcp.json'); | |
| + | writeFileSync(file, JSON.stringify(convo)); | |
| + | const bin = join(dirname(fileURLToPath(import.meta.url)), '..', 'bin', 'treetrace.js'); | |
| + | try { | |
| + | const responses = await new Promise((resolveP, rejectP) => { | |
| + | const child = spawn('node', [bin, 'mcp', '--from', 'chatgpt', '--file', file, '--dir', dir], { | |
| + | stdio: ['pipe', 'pipe', 'ignore'], | |
| + | }); | |
| + | let buf = ''; | |
| + | child.stdout.on('data', (d) => { buf += d; }); | |
| + | child.on('error', rejectP); | |
| + | const send = (o) => child.stdin.write(JSON.stringify(o) + '\n'); | |
| + | send({ jsonrpc: '2.0', id: 1, method: 'initialize', params: {} }); | |
| + | send({ jsonrpc: '2.0', id: 2, method: 'tools/list', params: {} }); | |
| + | send({ jsonrpc: '2.0', id: 3, method: 'tools/call', params: { name: 'lessons', arguments: {} } }); | |
| + | send({ jsonrpc: '2.0', id: 99, method: 'tools/call', params: { name: 'nope', arguments: {} } }); | |
| + | setTimeout(() => { | |
| + | child.stdin.end(); | |
| + | child.kill(); | |
| + | resolveP(buf.split('\n').filter(Boolean).map((l) => JSON.parse(l))); | |
| + | }, 2000); | |
| + | }); | |
| + | ||
| + | const init = responses.find((r) => r.id === 1); | |
| + | assert.ok(init && init.jsonrpc === '2.0', 'initialize must be JSON-RPC 2.0'); | |
| + | assert.equal(init.result.serverInfo.name, 'treetrace'); | |
| + | assert.ok(init.result.protocolVersion, 'initialize must advertise a protocol version'); | |
| + | ||
| + | const list = responses.find((r) => r.id === 2); | |
| + | const names = list.result.tools.map((t) => t.name).sort(); | |
| + | assert.deepEqual(names, ['eval_candidates', 'handoff', 'lessons', 'security_summary']); | |
| + | ||
| + | const call = responses.find((r) => r.id === 3); | |
| + | assert.ok(call.result && Array.isArray(call.result.content), 'tools/call must return content array'); | |
| + | assert.equal(call.result.content[0].type, 'text'); | |
| + | assert.ok(/TreeTrace Lessons/.test(call.result.content[0].text), 'lessons tool should return the lessons markdown'); | |
| + | ||
| + | const bad = responses.find((r) => r.id === 99); | |
| + | assert.ok(bad.error && bad.error.code === -32602, 'unknown tool should return a JSON-RPC error'); | |
| + | } finally { | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | } | |
| + | }); |