← index2026-05-03 00:07 (Beirut)(backfill from DOCUMENTATION/)

Brian Self-Knowledge System — 3-Review Synthesis

Three senior reviewers (R1, R2, R3) read the brief independently. This is what I take from them.

Convergent findings (3/3 — adopt without further debate)

Issue	All 3 say	Action
`data.md` is mis-shelved	Storage is an atom, not a composite. Move raw stores to layer 1; keep semantic wrappers as composites.	Phase 2A: split into `data_stores.md` (atom) + leave only the wrapper rows in `data.md` (or rename to surfaces).
Capability schema is anemic	`atom + composite + skill → outcome` needs more columns.	Add (at minimum): `cost_class`, `risk_level` / `side_effects` enum, `idempotency`, `freshness_budget`, `evidence_pointer`. Don't invent a DSL — extra columns.
Probes need structured output, not bool	One-liner exit codes can't drive the autofix matrix.	Define probe contract: `{ok, layer (transport/auth/quota/shape/runtime), latency_ms, evidence, hint}`. Probes still one-liners — just emit JSON. Brightdata fix proves it.
Append-only journal is mandatory	Without it, drift, audit, miner, and proposal review all break.	`/root/.claude/system/events/YYYY-MM-DD.ndjson`. Every probe, status flip, autofix, boundary block, capability invocation appends.
Sub-agent proposal inbox	Read-only is right for now, but pure read-only starves discovery.	`.inbox/` — sub-agents drop timestamped JSON proposals; main Brian processes at session boundary.
Freshness budgets per row	Probe-or-cascade-red is not enough; APIs deprecate silently.	Frontmatter declares per-row `max_age`; states become `verified-fresh / verified-stale / red / unknown` (4 states, not 2).
Boundary enforcement = data, not code	Frozen settings.json + self-modifying boundaries don't compose.	One generic PreToolUse hook reads `boundaries.md` (or its YAML form) at execution time. Brian editing boundaries is a file write — no harness restart. OPA-style.
Defer the generator (Phase 4)	Cross-product is garbage; tagged generation is premature without probes/boundaries first.	Skip Phase 4 entirely for now. Capabilities are mined from successful invocations in the journal, not generated.
Defer open-source	Too early. Sanitize after the runtime exists, not before.	Phase 5 → after Phase 2/3 are mechanical.
Markdown ≠ source of truth long-term	Human-readable is good; LLMs scanning 13 files every turn is expensive; structure is parse-fragile.	YAML or SQLite registry as truth, markdown generated as the view. Not week-1, but design now so Phase 2 doesn't lock us into markdown-as-DB.

Divergent calls (you decide)

Issue	R1	R2	R3	My read
Naming	Keep "cap-protocol" → ACP (Agent Capability Protocol)	Doesn't push hard	"cap" is loaded; capability-based-security is a 40-yr field — pick something else	R3 is right. The name conflict will burn us in OSS. Suggest "afford" / "task-affordance-protocol" / "agent-resource-graph".
What lives in layer 1 vs 2	Move `data.md` only	Reclassify almost everything (7 sub-files instead of 4 layers)	Move `data.md` AND `subsystems.md` to atoms; rename layer 2 to "surfaces"	R3 is most surgical. R2's 7-layer split is over-engineered. Move two files, rename layer 2.
Capability columns to add	cost/latency only	13 fields incl. preconditions/postconditions/rollback	5 fields incl. side-effects enum	R3's enum + R2's freshness/risk. Don't add 13. Add ~6: side_effects, risk, cost_class, idempotency, freshness_budget, last_invoked_at.
Boundary enforcement engine	hooks per rule	OPA + Rego	generic PreToolUse hook reading boundaries.md as data	R3. OPA is right idea, but heavy for one-person setup. Generic hook + JSON-rule format gets us there at 5% of the complexity.
Discovery mechanism (single CORE pointer vs belt+suspenders)	(didn't address)	(didn't address)	Skip the test, ship belt+suspenders day-one — single-pointer fails under load	R3 is right. Long-context attention degrades on single lines. Add SessionStart hook + /sys command now.

Things all 3 caught that I missed

Consultation is currently unfalsifiable (R3) — there's no audit that the system was actually used. Fix: invocation journal becomes the audit. Daily brief gates on "consultation rate," not "discovery effectiveness."
Instant-action vs consult-first is a real tension (R3) — must be resolved by making consultation a 50ms in-memory index lookup, not a 5-second 13-file recital.
The Brightdata fix should have generalized the schema (R3) — needs a runtime: npx|global|docker|systemd column on access.md so the cache-sweep cohort is auto-derivable. Every postmortem should ask "what column was missing?"
"I cannot" needs a third state (R2) — yes / no / yes after probe / yes after approval / blocked by policy. Forcing binary creates either helplessness or overreach.
Capability state is dynamic, not static (R2) — should be resolved at runtime: state = deps + freshness + policy + risk + context. A capctl resolve CLI is the unit of operational truth, not a markdown row.
Idempotency is critical for autofix (R2) — restart twice = fine; send WhatsApp twice ≠ fine. Auto-fix matrix needs idempotency-aware classes, not flat list.

Revised Phase 2 scope (what I'll actually build)

Build:
1. Convert atoms (4 files) + capabilities.md + boundaries.md to structured registry — YAML or SQLite (decide: YAML for v1, easier to diff in git).
2. capctl CLI: probe, status, resolve <cap-id>, tail-events.
3. Probe runner — JSON contract {ok, layer, latency_ms, evidence, hint}. Wire 10 most-used capabilities first.
4. Append-only event journal /root/.claude/system/events/.
5. .inbox/ proposal mechanism for sub-agents.
6. Freshness-budget-aware state resolver (4 states + dependency cascade).
7. Generic PreToolUse boundary hook that reads boundaries.yaml at runtime; wire Tier-0 rules (money, paid models, Jonah's personal accounts, WA recipient).
8. Discovery: belt-and-suspenders day one (CORE pointer + SessionStart hook + /sys command).
9. Add runtime, cost_class, risk_level, side_effects, idempotency, freshness_budget columns to relevant rows.
10. Markdown is now generated from the registry, not authoritative.

Defer:
- Phase 4 generator (replace with miner driven by event journal — much later).
- Phase 5 open-source (after Phase 2 + 3 prove load-bearing for 4+ weeks).
- Layer-2 over-decomposition (R2's 7 sub-files). Stick with 4 layers, just fix the misclassifications.

Cut:
- The "consult before every non-trivial action" rule as currently written — replace with "every Tier-1+ action must call capctl resolve <id> and the call appends to the event journal." Mechanical, not behavioral.

Open questions for you (Jonah)

Name: keep "cap-protocol" or pick something without the capability-security baggage? My vote: rename. Candidates: afford-protocol, agent-resource-graph, task-affordance-registry. Want me to push another ask form for this?
Registry format: YAML files in git, or SQLite? YAML wins on diff/PR/human-edit; SQLite wins on query/concurrent-write/scale. My vote: YAML for now, SQLite if/when we have >200 capabilities.
Event journal scope: every probe + every Tier-0/1 invocation, or also every read-only capctl resolve call? My vote: every resolve too — that's the consultation audit R3 wants.
Phase 2 build approach: ship in one bigger commit (1-2 sessions) or break into 5 micro-phases (probe contract → resolver → inbox → hook → events)? My vote: 5 micro-phases, each shippable.

Bottom line

The architecture survived three senior reviews intact. The corrections are mechanical, not conceptual: structured probes, structured boundaries, structured journal, generic hook reading data, miner-not-generator. The system shifts from "documentation Brian reads" to "operational control plane Brian queries." That's the load-bearing change.

Ready to start Phase 2 micro-phase 1 (registry conversion + capctl resolve) on your green light.