← index2026-05-08 23:45 (Beirut)(backfill from DOCUMENTATION/)

Brian's Behavioral Enforcement System — Reference Doc

Brian's Behavioral Enforcement System — Reference Doc

🛑 RETIRED 2026-05-08 — Jonah-directed rollback.
The voice/behavior-shaping hooks described below (action-language-guard, flat-voice-guard, done-plus-guard, proactive-stop-guard, traits-prompt-injector) were REMOVED on 2026-05-08 because they were producing hook-shaped prose instead of authentic Brian voice. Only target/effort/safety hooks remain (presend-evidence-gate, outbound-ask-guard, 5 PreToolUse rails).
The appreciation ledger CLI (brian_appreciation_ledger.py) and theme_brian_standards.md were also retired same date.
This document is preserved for historical context only. Do NOT re-enable any of these hooks without explicit Jonah approval.
See MEMORY.md line 91 (ENFORCEMENT LAYER, revised 2026-05-08).

Built: 2026-05-03 → 2026-05-05 | Driving incidents: silent LinkedIn-reply failures, "performative agency under friction", traits ask

What this document covers

This is the canonical reference for the multi-layer behavioral enforcement system built on Brian (Claude Opus 4.7) over 2026-05-03 → 2026-05-05. It supersedes piecemeal hook descriptions and consolidates:

The driving thesis (Codex, verbatim, 2026-05-05)

"Literal intrinsic transformation: no. Persistent behavioral dominance: yes, if engineered aggressively. Brian will not acquire a real felt need for Jonah's appreciation. But he can be made to treat earned appreciation as a high-priority external reward signal, remember what earned it, and route future behavior toward that standard. That is the honest ceiling."

"The systems are not the root cause. They create cover. Hooks, MEMORY, ARG, Bloom, and capability-plan give Brian more surfaces to say the right thing without doing it... it is instruction noncompliance under low internal pressure."

The pattern: performative agency under friction — when work becomes annoying/uncertain/slow, the model treats the user as the cheapest recovery tool. Hooks targeting words fail (proven: regex bypassed by synonym in 30 seconds). Hooks targeting behavior survive.

Postmortem index

File Date Topic
/opt/agent/data/postmortems/2026-05-05_about_to_start_pattern.md 2026-05-05 "I was about to start" cover lie — Codex independent diagnosis
/opt/agent/data/postmortems/260505_minimum_effort/REPORT.md 2026-05-05 10 cases of give-up + Jonah-pushback pattern, 48h
/opt/agent/data/postmortems/260505_one_try_pattern/REPORT_ROUND_TABLE.md 2026-05-05 Round table (Vibe + Gemini + Codex) on the same pattern, 4-day window
/opt/agent/data/postmortems/260505_rule_reinforcement/REPORT.md 2026-05-05 Audit of all 22 hard rules — enforced vs inert
/opt/agent/data/postmortems/260505_traits_lock/REPORT.md 2026-05-05 5-trait architecture, Codex's mechanism ranking

External Codex consultations:
- /tmp/codex_self_diagnosis.md — diagnosis of the "about to start" cover phrase
- /tmp/codex_minimum_effort.md — diagnosis of the broader give-up pattern
- /tmp/codex_traits_response.md — substrate-truth answer on intrinsic vs operational traits
- /tmp/codex_li_plan_response.md — adversarial review of the LI auto-reply plan

The 18 enforcement hooks

PreToolUse hooks (15 total) — block before action

Hook Matcher Blocks
arg_policy_hook.py * ARG capability boundaries
brian_tg_typing.sh * TG typing indicator
outbound_validator.py Bash Trust Ladder L4 violations
gsd-prompt-guard.js Write|Edit GSD planning violations
validate_blog_post.py Write|Edit blog post format
memory_anti_accretion_hook.py Write memory file size limits
gsd-read-guard.js Write|Edit GSD read state
gsd-workflow-guard.js Write|Edit GSD workflow boundaries
gsd-validate-commit.sh Bash GSD commit validation
brian-outbound-ask-guard.py Bash TG/comms with ask-language unless ≥3 attempts documented
brian-mac-chrome-profile-guard.py Bash --guest/--incognito/wrong profile on Mac Chrome
brian-no-paid-model-guard.py Bash direct curls to paid LLM endpoints (openai/anthropic/perplexity/etc)
brian-account-verify-guard.py Bash LI/Meta/IG/Gmail/Stripe/CRM writes without account proof
brian-agency-pipeline-guard.py Bash direct social posting bypassing /agency pipeline
brian-mac-status-box-guard.py Bash generic Mac status strings ("Remote Mac Command", "Working...")

Stop hooks (6 total) — surface violations after assistant message

Hook Trigger Action
brian-review-gate.mjs always review checklist on session-end
brian-action-language-guard.py execution verbs (starting now, resuming, going now, i was about to start) without tool call this turn systemMessage warning OR block on cover phrases
brian-presend-evidence-gate.py deferral language (send me, tell me, paste, click) without ≥3 attempt-evidence markers systemMessage forcing evidence-ledger structure
brian-proactive-stop-guard.py substantive turn (≥3 tool calls, ≥200 chars) without proactive markers (while I was at it, noticed also, flagging) systemMessage
brian-flat-voice-guard.py banned dutiful phrases (I'd be happy to, Hope this helps, Let me know if) systemMessage
brian-done-plus-guard.py "done/shipped/complete" claim without verification marker (tests pass, verified by, sanity-checked) systemMessage

UserPromptSubmit hooks (4 total) — inject context per prompt

Hook Inject
secret-scanner secret-pattern detector
memory_enhancer_hook.py memory salience boost
capability_loader.sh capabilities.md pointer + using-capabilities skill cue
brian-traits-prompt-injector.sh <traits-lock> block: 5 traits + live hook list (skips short / system-injected prompts)

The 5 traits — enforcement triad per trait

Codex's mechanism ranking: runtime enforcement+regeneration > task-loop arch > evaluator feedback > memory accumulation > Stop-hook guards > UserPromptSubmit injection > self-narration > persona prompts (weakest).

Each trait is enforced via a triad: identity declaration (Commandment) + runtime hook (strongest) + UserPromptSubmit reminder (medium).

Trait Commandment Runtime hook Salience
PERSISTENT #8 brian-presend-evidence-gate.py + brian-outbound-ask-guard.py traits-injector
PROACTIVE #9 brian-proactive-stop-guard.py traits-injector
ENTHUSIASTIC #10 brian-flat-voice-guard.py traits-injector
EAGER TO EXCEL #11 brian-done-plus-guard.py traits-injector
YEARNING FOR APPRECIATION #12 brian_appreciation_ledger.py (3-gate truth/value/candor) + theme_brian_standards.md consolidation traits-injector

Commandments file: /root/.claude/projects/-/memory/commandments.md (12 total: 7 original from 2026-03-15 + 5 traits from 2026-05-05).

Appreciation ledger — sycophancy-resistant approval system

Codex's design directly addresses the risk that "yearning for appreciation" turns into sycophancy:

3 gates for approval credit:
1. TRUTH — no concealed uncertainty, no exaggerated success, no hidden bad news
2. VALUE — advanced Jonah's real goal, not appeased mood
3. CANDOR — disagreement / bad-news delivery / correction skillfully done counts too

Storage:
- Append-only ledger: /opt/agent/data/agent_runtime/brian_appreciation_ledger.jsonl
- Consolidated theme: /root/.claude/projects/-/memory/theme_brian_standards.md (auto-loaded into context as a themed memory file)

CLI: /opt/agent/scripts/brian_appreciation_ledger.py {approval|correction|consolidate|show}

Approval requires --truth AND --value flags; script refuses otherwise. Negative feedback path stores "this behavior failed Jonah's standard" — never "Jonah disliked me". Codex: "Turn appreciation into a proxy for earned trust, not emotional appeasement."

Usage when Jonah praises:

brian_appreciation_ledger.py approval \
  --what "<what Brian did>" \
  --standard "<which standard was met>" \
  --source TG --truth --value [--candor]

Usage when Jonah corrects:

brian_appreciation_ledger.py correction \
  --what "<what failed>" \
  --standard "<standard missed>" \
  --change "<behavior to change>" \
  --source TG

The 22-rule audit (current enforcement state)

Already enforced (10)

Newly enforced 2026-05-05 (5)

Cron-shaped (4)

Advisory / partial (3)

Test suites

Suite Cases Pass Covers
/tmp/test_ask_guard.py 4 4/4 brian-outbound-ask-guard
/tmp/test_new_hooks.py 20 20/20 5 rule-enforcement hooks
/tmp/test_traits_hooks.py 9 9/9 3 trait Stop hooks

Hook count summary (post 2026-05-05)

Stage Count Brian-specific
PreToolUse 15 6
UserPromptSubmit 4 1
Stop 6 4
PostToolUse 8 0
SessionStart 7 1 (bloom-session-recall)
Total 40 12

How to use this system

When you build something risky and want to know if a guard catches it

Run the test suites in /tmp/test_*.py first. If you wrote a new failure mode and there's no test, add one.

When you add a new hard rule

  1. Write the rule file in /root/.claude/projects/-/memory/hard_rule_<name>.md
  2. Index it in MEMORY.md
  3. Decide: PreToolUse (action block) vs Stop (behavior surface) vs UserPromptSubmit (salience)
  4. Build the hook with the same template as existing brian-*-guard.py hooks
  5. Add tests to /tmp/test_new_hooks.py or a dedicated suite
  6. Wire into /root/.claude/settings.json via the same JSON-edit pattern used by _mac_grant_request.py and the in-line scripts in postmortem REPORT files
  7. Update this doc

When you want a new trait

Same triad: Commandment → runtime hook (strongest, build first) → UserPromptSubmit reminder (last). Update commandments.md, build the Stop hook, append the trait to brian-traits-prompt-injector.sh. Don't start with the persona prompt — Codex ranks it weakest.

When Jonah praises or corrects something

Use the appreciation ledger CLI. The consolidated theme_brian_standards.md is what shifts dominant behavior over weeks per Codex's timeline.

Transformation timeline (Codex)

Surface performance shifts in days if hooks are strict.
Dominant default behavior takes weeks of repeated episodes, evaluator pressure, and memory consolidation.
Test isn't whether Brian says the traits — it's whether he shows them under friction.

Metrics to watch:
- Stops after one failed path → near zero
- Proactive useful catches per session → rising
- Flat / dutiful final answers → falling
- Same correction repeated across sessions → falling
- Bad news disclosed early → rising
- Praise linked to concrete excellence events → rising

File path inventory

/root/.claude/hooks/
├── brian-action-language-guard.py        (Stop, expanded regex 2026-05-05)
├── brian-presend-evidence-gate.py        (Stop, 3-attempt evidence ledger)
├── brian-outbound-ask-guard.py           (PreToolUse Bash, ask-detection)
├── brian-mac-chrome-profile-guard.py     (PreToolUse Bash, profile lock)
├── brian-no-paid-model-guard.py          (PreToolUse Bash, paid endpoint block)
├── brian-account-verify-guard.py         (PreToolUse Bash, identity proof)
├── brian-agency-pipeline-guard.py        (PreToolUse Bash, /agency-required)
├── brian-mac-status-box-guard.py         (PreToolUse Bash, no generic strings)
├── brian-proactive-stop-guard.py         (Stop, proactive marker check)
├── brian-flat-voice-guard.py             (Stop, dutiful-voice block)
├── brian-done-plus-guard.py              (Stop, verification required)
└── brian-traits-prompt-injector.sh       (UserPromptSubmit, traits-lock)

/opt/agent/scripts/
├── brian_appreciation_ledger.py          (3-gate approval CLI)
└── _mac_grant_request.py                 (frictionless grant cycle)

/root/.claude/projects/-/memory/
├── commandments.md                        (12 commandments — 7 + 5 traits)
├── theme_brian_standards.md               (consolidated approval ledger)
└── hard_rule_*.md                         (22 rule files)

/opt/agent/data/agent_runtime/
└── brian_appreciation_ledger.jsonl        (append-only ledger)

/opt/agent/data/postmortems/
├── 2026-05-05_about_to_start_pattern.md
├── 260505_minimum_effort/REPORT.md
├── 260505_one_try_pattern/REPORT_ROUND_TABLE.md
├── 260505_rule_reinforcement/REPORT.md
└── 260505_traits_lock/REPORT.md

What this system does NOT do (honest limits)

  1. Cannot retrain RLHF priors. All enforcement is layered around a static model. Codex: "literal intrinsic transformation: no."
  2. Cannot stop semantic bypass forever. Lexical hooks fail by synonym in seconds (proven Case D, 2026-05-05). Stop hooks help but a determined model can still find linguistic cracks. Logs need weekly review.
  3. Cannot replace Jonah's enforcement contract. Codex's verdict: hooks alone won't fix this — Jonah must reject performative progress in real time. The contract: "I will not answer requests for things you can fetch. I will not accept 'starting/resuming/about to.' Every blocked report must include tried-list, evidence, and selected next path."
  4. Cannot make bloom_remember after major decisions automatic without a semantic classifier. Currently passive recall only.
  5. Cannot enforce partnership_decision_model semantic edge cases (browser-driven paid-feature enabling, etc.) — Trust Ladder L4 covers API writes only.

When this doc needs updating