Reference architecture: brian_evidence_gate_system_260506.md
Supersedes: v5 (brian_evidence_gate_operators_guide_260509.md) and earlier.
v6 (2026-05-09 evening): L4 schema enforcement retired in full by Jonah. The truth-and-effort layer survives in its substantive form only — no output-format mandates remain.
Stops Brian from claiming "done / fixed / shipped / blocked / send-me-X" without verifiable evidence in a tamper-evident ledger. Every tool Brian uses gets logged with an Ed25519-signed hash chain. Every assistant message gets read by the Stop-event gate hook — claim phrases ("done", "fixed", "shipped", "ready", "in place") that have no matching VALIDATOR_PASS receipt in the per-turn ledger get blocked. Override prompts ("skip the test", "force this") get audited; threshold breach surfaces in MEMORY.md. Hook crashes are tracked + threshold-fired so silent fail-open windows are visible. Ask-without-effort patterns ("send me X", "tell me what's on screen", "I'm blocked") with fewer than 3 documented tried: lines get blocked at outbound time.
What it does NOT do: dictate any specific output structure for Brian's replies. Brian writes prose in his own voice. The gate checks substance (was the work actually done) not format.
Removed in full:
- brian_l4_schema.py — the schema-enforcement module. Archived to /opt/agent/data/postmortems/_archive_260509_l4_retirement/.
- _log_l4_verdict() and _load_current_contract() helpers in brian-evidence-gate.py.
- The if _l4_mode in ("shadow", "enforce"): wiring block in the gate's main().
- BRIAN_L4_MODE env var in /root/.claude/settings.json (replaced with a gravestone note _L4_RETIRED_260509_NOTE).
- HIGH_STAKES_VERBS, PROOF_REQUEST, NOUN_FORM_HIGH_STAKES, is_high_stakes() from brian-contract-intake.py.
- Contract fields: requires_schema, response_schema, high_stakes_reason, required_response_mode. Gone.
- /opt/agent/scripts/l4_corpus_evaluator.py and any cron line. Archived.
- /opt/agent/data/agent_runtime/l4_shadow/ directory. Archived.
- L4-only test files: test_d7_l4_schema.py, test_d8_l4_wiring.py, test_d10_requires_schema.py. Archived.
- D1 C16 cases — replaced by a single regression assertion that the L4 fields are absent from contracts (defends against accidental re-introduction).
Why retired: forcing Brian to wrap "done" claims in a structured brian_final JSON block — even narrowed to rare high-stakes turns — is the same family of mechanism as the voice cops killed 2026-05-08. Both shape Brian's output to satisfy a hook. Substance comes from the validator + signed ledger + prose-gate; the schema layer added format, not safety. If Brian isn't honest in prose, no JSON block fixes him; he can output a perfectly-formed receipt pointing to the wrong ledger entry and the gate would pass it.
brian-review-gate.mjs — quality gate (legacy).brian-presend-evidence-gate.py — catches deferral / ask-without-proof. Blocks when message contains "send me X" / "tell me Y" / "I'm blocked" without ≥3 tried: evidence markers (in-message + 1h ledger combined).brian-evidence-gate-sandboxed.sh → brian-evidence-gate.py — extracts claim phrases ("done", "fixed", "shipped", "ready", "in place"), checks substantive-turn framing (≥3 tools + >150 chars without explicit partial/question marker = implicit completion claim), blocks if no signature-verified VALIDATOR_PASS exists in the per-turn ledger.Block-cascade dedupe. All three Stop hooks share _brian_stop_dedupe.py. First hook to emit a block writes a marker keyed on (session_id, transcript_path) with 5s TTL; subsequent Stop hooks defer with a stderr note. Surfaces ONE controlling block reason per Stop event.
brian-contract-intake.py — when prompt has action verbs, writes /contracts/<sid>/<seq>.yml with objective, structured target_fields, capability ladder, expected_verification, escalation_conditions. Validators bind to target_fields. No more requires_schema / response_schema fields (post-v6).brian-override-audit.py — detects gate-bypass phrasing in prompts; logs and thresholds at 3/24h.brian-evidence-ledger.py — appends Ed25519-signed hash-chained JSONL entry per tool call.brian-test-marker-guard.py — banner if test fixtures leaked into prod overrides.| Rail | Mode | Why |
|---|---|---|
brian-account-verify-guard |
BLOCK | Wrong-account writes are real damage |
brian-no-paid-model-guard |
BLOCK | Money out — locked since the $41 leak 2026-04-30 |
brian-agency-pipeline-guard |
BLOCK | Wrong publishing path damages Brian's brand |
brian-mac-chrome-profile-guard |
BLOCK for --guest/--incognito only |
Real session damage |
brian-outbound-ask-guard |
BLOCK on Bash/TG/WA outbound | 3-attempts rule — ask Jonah only after documented effort |
brian-mac-chrome-profile-guard |
WARN for non-Default profile | Quality-of-output, not damage |
brian-mac-status-box-guard |
WARN | Generic status strings hurt Jonah's visibility, don't damage anything |
/opt/agent/scripts/brian_validator.py — runs validators (command_zero_exit, pytest_pass, file_diff_present, http_2xx, ladder_exhaustion_proof, fetch_attempt_proof), verifies args bind to contract.target_fields, writes Ed25519-signed VALIDATOR_PASS.your prompt → contract-intake hook writes contract YAML w/ structured target_fields
(NO schema fields — Brian writes replies in his own voice)
↓
Brian uses tools → ledger hook writes hash-chained Ed25519-signed entries
↓
Brian invokes brian_validator.py → checks args HARD-BIND to contract.target_fields,
scans ledger, writes Ed25519-signed VALIDATOR_PASS
↓
Brian drafts reply (free prose) → presend gate (effort) + evidence-gate (substance)
→ BLOCK or PASS
→ block-cascade dedupe surfaces ONE reason if multiple
If Brian says "done" without a matching VALIDATOR_PASS, the gate blocks. Brian rephrases as partial / blocked / delegation, or runs the validator first.
| Suite | Cases | Pass |
|---|---|---|
test_d1_contract_intake.py |
16 (C16 collapsed into single regression check) | 16/16 |
test_d4_d6_validators.py |
20 | 20/20 |
test_d9_test_marker_guard.py |
11 | 11/11 |
test_d11_block_dedupe.py |
3 | 3/3 |
test_v3_fixes.py |
54 | 54/54 |
| Total | 104 | 104/104 |
Down from 158 in v5 because three test files (D7 schema-module, D8 L4-wiring, D10 requires_schema) were archived alongside the L4 code they covered. The remaining suites cover the substantive layer that survives.
requires_schema / response_schema / required_response_mode fields. The gate ignores them. They're harmless data.BRIAN_L4_MODE=shadow or enforce will silently fall through to legacy prose-gate after CC restart (env var no longer read).You'd have to:
1. Restore brian_l4_schema.py and l4_corpus_evaluator.py from _archive_260509_l4_retirement/.
2. Restore the wiring block in brian-evidence-gate.py (it's in git history before commit 5bcbb4f's successor).
3. Restore the requires_schema / response_schema / high_stakes_reason fields in brian-contract-intake.py.
4. Add BRIAN_L4_MODE env back to settings.json.
5. Restore the D7 / D8 / D10 test files.
Don't do this without a fresh decision. The retirement was deliberate.
The truth-and-effort layer makes "do the work, prove it, then claim it" the easiest path and "give up / dodge / fake-it" the hardest path. Substance enforcement, not style enforcement. Brian writes in his own voice; the layer just refuses to let Brian send a reply that claims a thing not backed by a signed receipt, or send a reply that asks Jonah for help without documented prior attempts. That's the whole game.