← index2026-05-03 00:07 (Beirut)(backfill from DOCUMENTATION/)

Brian Self-Knowledge System — 3-Review Synthesis

Brian Self-Knowledge System — 3-Review Synthesis

Three senior reviewers (R1, R2, R3) read the brief independently. This is what I take from them.

Convergent findings (3/3 — adopt without further debate)

Issue All 3 say Action
data.md is mis-shelved Storage is an atom, not a composite. Move raw stores to layer 1; keep semantic wrappers as composites. Phase 2A: split into data_stores.md (atom) + leave only the wrapper rows in data.md (or rename to surfaces).
Capability schema is anemic atom + composite + skill → outcome needs more columns. Add (at minimum): cost_class, risk_level / side_effects enum, idempotency, freshness_budget, evidence_pointer. Don't invent a DSL — extra columns.
Probes need structured output, not bool One-liner exit codes can't drive the autofix matrix. Define probe contract: {ok, layer (transport/auth/quota/shape/runtime), latency_ms, evidence, hint}. Probes still one-liners — just emit JSON. Brightdata fix proves it.
Append-only journal is mandatory Without it, drift, audit, miner, and proposal review all break. /root/.claude/system/events/YYYY-MM-DD.ndjson. Every probe, status flip, autofix, boundary block, capability invocation appends.
Sub-agent proposal inbox Read-only is right for now, but pure read-only starves discovery. .inbox/ — sub-agents drop timestamped JSON proposals; main Brian processes at session boundary.
Freshness budgets per row Probe-or-cascade-red is not enough; APIs deprecate silently. Frontmatter declares per-row max_age; states become verified-fresh / verified-stale / red / unknown (4 states, not 2).
Boundary enforcement = data, not code Frozen settings.json + self-modifying boundaries don't compose. One generic PreToolUse hook reads boundaries.md (or its YAML form) at execution time. Brian editing boundaries is a file write — no harness restart. OPA-style.
Defer the generator (Phase 4) Cross-product is garbage; tagged generation is premature without probes/boundaries first. Skip Phase 4 entirely for now. Capabilities are mined from successful invocations in the journal, not generated.
Defer open-source Too early. Sanitize after the runtime exists, not before. Phase 5 → after Phase 2/3 are mechanical.
Markdown ≠ source of truth long-term Human-readable is good; LLMs scanning 13 files every turn is expensive; structure is parse-fragile. YAML or SQLite registry as truth, markdown generated as the view. Not week-1, but design now so Phase 2 doesn't lock us into markdown-as-DB.

Divergent calls (you decide)

Issue R1 R2 R3 My read
Naming Keep "cap-protocol" → ACP (Agent Capability Protocol) Doesn't push hard "cap" is loaded; capability-based-security is a 40-yr field — pick something else R3 is right. The name conflict will burn us in OSS. Suggest "afford" / "task-affordance-protocol" / "agent-resource-graph".
What lives in layer 1 vs 2 Move data.md only Reclassify almost everything (7 sub-files instead of 4 layers) Move data.md AND subsystems.md to atoms; rename layer 2 to "surfaces" R3 is most surgical. R2's 7-layer split is over-engineered. Move two files, rename layer 2.
Capability columns to add cost/latency only 13 fields incl. preconditions/postconditions/rollback 5 fields incl. side-effects enum R3's enum + R2's freshness/risk. Don't add 13. Add ~6: side_effects, risk, cost_class, idempotency, freshness_budget, last_invoked_at.
Boundary enforcement engine hooks per rule OPA + Rego generic PreToolUse hook reading boundaries.md as data R3. OPA is right idea, but heavy for one-person setup. Generic hook + JSON-rule format gets us there at 5% of the complexity.
Discovery mechanism (single CORE pointer vs belt+suspenders) (didn't address) (didn't address) Skip the test, ship belt+suspenders day-one — single-pointer fails under load R3 is right. Long-context attention degrades on single lines. Add SessionStart hook + /sys command now.

Things all 3 caught that I missed

  1. Consultation is currently unfalsifiable (R3) — there's no audit that the system was actually used. Fix: invocation journal becomes the audit. Daily brief gates on "consultation rate," not "discovery effectiveness."
  2. Instant-action vs consult-first is a real tension (R3) — must be resolved by making consultation a 50ms in-memory index lookup, not a 5-second 13-file recital.
  3. The Brightdata fix should have generalized the schema (R3) — needs a runtime: npx|global|docker|systemd column on access.md so the cache-sweep cohort is auto-derivable. Every postmortem should ask "what column was missing?"
  4. "I cannot" needs a third state (R2) — yes / no / yes after probe / yes after approval / blocked by policy. Forcing binary creates either helplessness or overreach.
  5. Capability state is dynamic, not static (R2) — should be resolved at runtime: state = deps + freshness + policy + risk + context. A capctl resolve CLI is the unit of operational truth, not a markdown row.
  6. Idempotency is critical for autofix (R2) — restart twice = fine; send WhatsApp twice ≠ fine. Auto-fix matrix needs idempotency-aware classes, not flat list.

Revised Phase 2 scope (what I'll actually build)

Build:
1. Convert atoms (4 files) + capabilities.md + boundaries.md to structured registry — YAML or SQLite (decide: YAML for v1, easier to diff in git).
2. capctl CLI: probe, status, resolve <cap-id>, tail-events.
3. Probe runner — JSON contract {ok, layer, latency_ms, evidence, hint}. Wire 10 most-used capabilities first.
4. Append-only event journal /root/.claude/system/events/.
5. .inbox/ proposal mechanism for sub-agents.
6. Freshness-budget-aware state resolver (4 states + dependency cascade).
7. Generic PreToolUse boundary hook that reads boundaries.yaml at runtime; wire Tier-0 rules (money, paid models, Jonah's personal accounts, WA recipient).
8. Discovery: belt-and-suspenders day one (CORE pointer + SessionStart hook + /sys command).
9. Add runtime, cost_class, risk_level, side_effects, idempotency, freshness_budget columns to relevant rows.
10. Markdown is now generated from the registry, not authoritative.

Defer:
- Phase 4 generator (replace with miner driven by event journal — much later).
- Phase 5 open-source (after Phase 2 + 3 prove load-bearing for 4+ weeks).
- Layer-2 over-decomposition (R2's 7 sub-files). Stick with 4 layers, just fix the misclassifications.

Cut:
- The "consult before every non-trivial action" rule as currently written — replace with "every Tier-1+ action must call capctl resolve <id> and the call appends to the event journal." Mechanical, not behavioral.

Open questions for you (Jonah)

  1. Name: keep "cap-protocol" or pick something without the capability-security baggage? My vote: rename. Candidates: afford-protocol, agent-resource-graph, task-affordance-registry. Want me to push another ask form for this?
  2. Registry format: YAML files in git, or SQLite? YAML wins on diff/PR/human-edit; SQLite wins on query/concurrent-write/scale. My vote: YAML for now, SQLite if/when we have >200 capabilities.
  3. Event journal scope: every probe + every Tier-0/1 invocation, or also every read-only capctl resolve call? My vote: every resolve too — that's the consultation audit R3 wants.
  4. Phase 2 build approach: ship in one bigger commit (1-2 sessions) or break into 5 micro-phases (probe contract → resolver → inbox → hook → events)? My vote: 5 micro-phases, each shippable.

Bottom line

The architecture survived three senior reviews intact. The corrections are mechanical, not conceptual: structured probes, structured boundaries, structured journal, generic hook reading data, miner-not-generator. The system shifts from "documentation Brian reads" to "operational control plane Brian queries." That's the load-bearing change.

Ready to start Phase 2 micro-phase 1 (registry conversion + capctl resolve) on your green light.