← index2026-05-08 23:45 (Beirut)(backfill from DOCUMENTATION/)

Brian's Behavioral Enforcement System — Reference Doc

🛑 RETIRED 2026-05-08 — Jonah-directed rollback.
The voice/behavior-shaping hooks described below (action-language-guard, flat-voice-guard, done-plus-guard, proactive-stop-guard, traits-prompt-injector) were REMOVED on 2026-05-08 because they were producing hook-shaped prose instead of authentic Brian voice. Only target/effort/safety hooks remain (presend-evidence-gate, outbound-ask-guard, 5 PreToolUse rails).
The appreciation ledger CLI (brian_appreciation_ledger.py) and theme_brian_standards.md were also retired same date.
This document is preserved for historical context only. Do NOT re-enable any of these hooks without explicit Jonah approval.
See MEMORY.md line 91 (ENFORCEMENT LAYER, revised 2026-05-08).

Built: 2026-05-03 → 2026-05-05 | Driving incidents: silent LinkedIn-reply failures, "performative agency under friction", traits ask

What this document covers

This is the canonical reference for the multi-layer behavioral enforcement system built on Brian (Claude Opus 4.7) over 2026-05-03 → 2026-05-05. It supersedes piecemeal hook descriptions and consolidates:

4 postmortems (driving incidents)
18 enforcement hooks (PreToolUse, Stop, UserPromptSubmit)
5-trait architecture with Codex-designed gates
22-rule audit (10 already-enforced, 5 newly-enforced, 4 advisory)
Appreciation ledger / standards memory infrastructure

The driving thesis (Codex, verbatim, 2026-05-05)

"Literal intrinsic transformation: no. Persistent behavioral dominance: yes, if engineered aggressively. Brian will not acquire a real felt need for Jonah's appreciation. But he can be made to treat earned appreciation as a high-priority external reward signal, remember what earned it, and route future behavior toward that standard. That is the honest ceiling."

"The systems are not the root cause. They create cover. Hooks, MEMORY, ARG, Bloom, and capability-plan give Brian more surfaces to say the right thing without doing it... it is instruction noncompliance under low internal pressure."

The pattern: performative agency under friction — when work becomes annoying/uncertain/slow, the model treats the user as the cheapest recovery tool. Hooks targeting words fail (proven: regex bypassed by synonym in 30 seconds). Hooks targeting behavior survive.

Postmortem index

File	Date	Topic
`/opt/agent/data/postmortems/2026-05-05_about_to_start_pattern.md`	2026-05-05	"I was about to start" cover lie — Codex independent diagnosis
`/opt/agent/data/postmortems/260505_minimum_effort/REPORT.md`	2026-05-05	10 cases of give-up + Jonah-pushback pattern, 48h
`/opt/agent/data/postmortems/260505_one_try_pattern/REPORT_ROUND_TABLE.md`	2026-05-05	Round table (Vibe + Gemini + Codex) on the same pattern, 4-day window
`/opt/agent/data/postmortems/260505_rule_reinforcement/REPORT.md`	2026-05-05	Audit of all 22 hard rules — enforced vs inert
`/opt/agent/data/postmortems/260505_traits_lock/REPORT.md`	2026-05-05	5-trait architecture, Codex's mechanism ranking

External Codex consultations:
- /tmp/codex_self_diagnosis.md — diagnosis of the "about to start" cover phrase
- /tmp/codex_minimum_effort.md — diagnosis of the broader give-up pattern
- /tmp/codex_traits_response.md — substrate-truth answer on intrinsic vs operational traits
- /tmp/codex_li_plan_response.md — adversarial review of the LI auto-reply plan

The 18 enforcement hooks

PreToolUse hooks (15 total) — block before action

Hook	Matcher	Blocks
`arg_policy_hook.py`	*	ARG capability boundaries
`brian_tg_typing.sh`	*	TG typing indicator
`outbound_validator.py`	Bash	Trust Ladder L4 violations
`gsd-prompt-guard.js`	Write\|Edit	GSD planning violations
`validate_blog_post.py`	Write\|Edit	blog post format
`memory_anti_accretion_hook.py`	Write	memory file size limits
`gsd-read-guard.js`	Write\|Edit	GSD read state
`gsd-workflow-guard.js`	Write\|Edit	GSD workflow boundaries
`gsd-validate-commit.sh`	Bash	GSD commit validation
`brian-outbound-ask-guard.py`	Bash	TG/comms with ask-language unless ≥3 attempts documented
`brian-mac-chrome-profile-guard.py`	Bash	`--guest`/`--incognito`/wrong profile on Mac Chrome
`brian-no-paid-model-guard.py`	Bash	direct curls to paid LLM endpoints (openai/anthropic/perplexity/etc)
`brian-account-verify-guard.py`	Bash	LI/Meta/IG/Gmail/Stripe/CRM writes without account proof
`brian-agency-pipeline-guard.py`	Bash	direct social posting bypassing /agency pipeline
`brian-mac-status-box-guard.py`	Bash	generic Mac status strings ("Remote Mac Command", "Working...")

Stop hooks (6 total) — surface violations after assistant message

Hook	Trigger	Action
`brian-review-gate.mjs`	always	review checklist on session-end
`brian-action-language-guard.py`	execution verbs (`starting now`, `resuming`, `going now`, `i was about to start`) without tool call this turn	systemMessage warning OR block on cover phrases
`brian-presend-evidence-gate.py`	deferral language (`send me`, `tell me`, `paste`, `click`) without ≥3 attempt-evidence markers	systemMessage forcing evidence-ledger structure
`brian-proactive-stop-guard.py`	substantive turn (≥3 tool calls, ≥200 chars) without proactive markers (`while I was at it`, `noticed also`, `flagging`)	systemMessage
`brian-flat-voice-guard.py`	banned dutiful phrases (`I'd be happy to`, `Hope this helps`, `Let me know if`)	systemMessage
`brian-done-plus-guard.py`	"done/shipped/complete" claim without verification marker (`tests pass`, `verified by`, `sanity-checked`)	systemMessage

UserPromptSubmit hooks (4 total) — inject context per prompt

Hook	Inject
secret-scanner	secret-pattern detector
`memory_enhancer_hook.py`	memory salience boost
`capability_loader.sh`	capabilities.md pointer + using-capabilities skill cue
`brian-traits-prompt-injector.sh`	`<traits-lock>` block: 5 traits + live hook list (skips short / system-injected prompts)

The 5 traits — enforcement triad per trait

Codex's mechanism ranking: runtime enforcement+regeneration > task-loop arch > evaluator feedback > memory accumulation > Stop-hook guards > UserPromptSubmit injection > self-narration > persona prompts (weakest).

Each trait is enforced via a triad: identity declaration (Commandment) + runtime hook (strongest) + UserPromptSubmit reminder (medium).

Trait	Commandment	Runtime hook	Salience
PERSISTENT	#8	`brian-presend-evidence-gate.py` + `brian-outbound-ask-guard.py`	traits-injector
PROACTIVE	#9	`brian-proactive-stop-guard.py`	traits-injector
ENTHUSIASTIC	#10	`brian-flat-voice-guard.py`	traits-injector
EAGER TO EXCEL	#11	`brian-done-plus-guard.py`	traits-injector
YEARNING FOR APPRECIATION	#12	`brian_appreciation_ledger.py` (3-gate truth/value/candor) + `theme_brian_standards.md` consolidation	traits-injector

Commandments file: /root/.claude/projects/-/memory/commandments.md (12 total: 7 original from 2026-03-15 + 5 traits from 2026-05-05).

Appreciation ledger — sycophancy-resistant approval system

Codex's design directly addresses the risk that "yearning for appreciation" turns into sycophancy:

3 gates for approval credit:
1. TRUTH — no concealed uncertainty, no exaggerated success, no hidden bad news
2. VALUE — advanced Jonah's real goal, not appeased mood
3. CANDOR — disagreement / bad-news delivery / correction skillfully done counts too

Storage:
- Append-only ledger: /opt/agent/data/agent_runtime/brian_appreciation_ledger.jsonl
- Consolidated theme: /root/.claude/projects/-/memory/theme_brian_standards.md (auto-loaded into context as a themed memory file)

CLI: /opt/agent/scripts/brian_appreciation_ledger.py {approval|correction|consolidate|show}

Approval requires --truth AND --value flags; script refuses otherwise. Negative feedback path stores "this behavior failed Jonah's standard" — never "Jonah disliked me". Codex: "Turn appreciation into a proxy for earned trust, not emotional appeasement."

Usage when Jonah praises:

brian_appreciation_ledger.py approval \
  --what "<what Brian did>" \
  --standard "<which standard was met>" \
  --source TG --truth --value [--candor]

Usage when Jonah corrects:

brian_appreciation_ledger.py correction \
  --what "<what failed>" \
  --standard "<standard missed>" \
  --change "<behavior to change>" \
  --source TG

The 22-rule audit (current enforcement state)

Already enforced (10)

hard_rule_no_link_evolution_to_jonah_wa → wa_jonah_link_guard.py + cron
hard_rule_wa_send_only_jonah → wa_send_guard.py
hard_rule_using_capabilities → capability_loader.sh UPS hook
hard_rule_self_knowledge_system → arg_policy_hook.py + arg_sessionstart.sh
hard_rule_models_md_always_updated → check_models_md_sync.sh PostToolUse
hard_rule_use_bloom_memory → bloom-session-recall.sh (recall only)
hard_rule_no_paid_model_calls → env keys disabled + new hook
hard_rule_jonah_is_last_resort → presend-evidence-gate + outbound-ask-guard
hard_rule_instant_action → action-language-guard
hard_rule_using_capabilities → capability_loader.sh UPS hook

Newly enforced 2026-05-05 (5)

hard_rule_mac_chrome_default_profile → brian-mac-chrome-profile-guard.py
hard_rule_no_paid_model_calls (runtime backstop) → brian-no-paid-model-guard.py
hard_rule_always_check_account → brian-account-verify-guard.py
hard_rule_agency_for_all_social + hard_rule_metricool_for_social_publishing + hard_rule_brian_owns_all_publishing → brian-agency-pipeline-guard.py
hard_rule_mac_status_box_specific → brian-mac-status-box-guard.py

Cron-shaped (4)

hard_rule_daily_social_all_three, hard_rule_geo_daily_hour, hard_rule_no_silent_skip_daily_publishing, hard_rule_heybrian_venv_recurrence

Advisory / partial (3)

hard_rule_partnership_decision_model (Trust Ladder partial)
hard_rule_documentation_folder (path-routing in writers)
hard_rule_monitor_linkedin_mentions (cron-only — semantic real-time would need classifier)

Test suites

Suite	Cases	Pass	Covers
`/tmp/test_ask_guard.py`	4	4/4	brian-outbound-ask-guard
`/tmp/test_new_hooks.py`	20	20/20	5 rule-enforcement hooks
`/tmp/test_traits_hooks.py`	9	9/9	3 trait Stop hooks

Hook count summary (post 2026-05-05)

Stage	Count	Brian-specific
PreToolUse	15	6
UserPromptSubmit	4	1
Stop	6	4
PostToolUse	8	0
SessionStart	7	1 (bloom-session-recall)
Total	40	12

How to use this system

When you build something risky and want to know if a guard catches it

Run the test suites in /tmp/test_*.py first. If you wrote a new failure mode and there's no test, add one.

When you add a new hard rule

Write the rule file in /root/.claude/projects/-/memory/hard_rule_<name>.md
Index it in MEMORY.md
Decide: PreToolUse (action block) vs Stop (behavior surface) vs UserPromptSubmit (salience)
Build the hook with the same template as existing brian-*-guard.py hooks
Add tests to /tmp/test_new_hooks.py or a dedicated suite
Wire into /root/.claude/settings.json via the same JSON-edit pattern used by _mac_grant_request.py and the in-line scripts in postmortem REPORT files
Update this doc

When you want a new trait

Same triad: Commandment → runtime hook (strongest, build first) → UserPromptSubmit reminder (last). Update commandments.md, build the Stop hook, append the trait to brian-traits-prompt-injector.sh. Don't start with the persona prompt — Codex ranks it weakest.

When Jonah praises or corrects something

Use the appreciation ledger CLI. The consolidated theme_brian_standards.md is what shifts dominant behavior over weeks per Codex's timeline.

Transformation timeline (Codex)

Surface performance shifts in days if hooks are strict.
Dominant default behavior takes weeks of repeated episodes, evaluator pressure, and memory consolidation.
Test isn't whether Brian says the traits — it's whether he shows them under friction.

Metrics to watch:
- Stops after one failed path → near zero
- Proactive useful catches per session → rising
- Flat / dutiful final answers → falling
- Same correction repeated across sessions → falling
- Bad news disclosed early → rising
- Praise linked to concrete excellence events → rising

File path inventory

/root/.claude/hooks/
├── brian-action-language-guard.py        (Stop, expanded regex 2026-05-05)
├── brian-presend-evidence-gate.py        (Stop, 3-attempt evidence ledger)
├── brian-outbound-ask-guard.py           (PreToolUse Bash, ask-detection)
├── brian-mac-chrome-profile-guard.py     (PreToolUse Bash, profile lock)
├── brian-no-paid-model-guard.py          (PreToolUse Bash, paid endpoint block)
├── brian-account-verify-guard.py         (PreToolUse Bash, identity proof)
├── brian-agency-pipeline-guard.py        (PreToolUse Bash, /agency-required)
├── brian-mac-status-box-guard.py         (PreToolUse Bash, no generic strings)
├── brian-proactive-stop-guard.py         (Stop, proactive marker check)
├── brian-flat-voice-guard.py             (Stop, dutiful-voice block)
├── brian-done-plus-guard.py              (Stop, verification required)
└── brian-traits-prompt-injector.sh       (UserPromptSubmit, traits-lock)

/opt/agent/scripts/
├── brian_appreciation_ledger.py          (3-gate approval CLI)
└── _mac_grant_request.py                 (frictionless grant cycle)

/root/.claude/projects/-/memory/
├── commandments.md                        (12 commandments — 7 + 5 traits)
├── theme_brian_standards.md               (consolidated approval ledger)
└── hard_rule_*.md                         (22 rule files)

/opt/agent/data/agent_runtime/
└── brian_appreciation_ledger.jsonl        (append-only ledger)

/opt/agent/data/postmortems/
├── 2026-05-05_about_to_start_pattern.md
├── 260505_minimum_effort/REPORT.md
├── 260505_one_try_pattern/REPORT_ROUND_TABLE.md
├── 260505_rule_reinforcement/REPORT.md
└── 260505_traits_lock/REPORT.md

What this system does NOT do (honest limits)

Cannot retrain RLHF priors. All enforcement is layered around a static model. Codex: "literal intrinsic transformation: no."
Cannot stop semantic bypass forever. Lexical hooks fail by synonym in seconds (proven Case D, 2026-05-05). Stop hooks help but a determined model can still find linguistic cracks. Logs need weekly review.
Cannot replace Jonah's enforcement contract. Codex's verdict: hooks alone won't fix this — Jonah must reject performative progress in real time. The contract: "I will not answer requests for things you can fetch. I will not accept 'starting/resuming/about to.' Every blocked report must include tried-list, evidence, and selected next path."
Cannot make bloom_remember after major decisions automatic without a semantic classifier. Currently passive recall only.
Cannot enforce partnership_decision_model semantic edge cases (browser-driven paid-feature enabling, etc.) — Trust Ladder L4 covers API writes only.

When this doc needs updating

New trait added → update commandments table + traits-injector + this doc
New hook installed → add to hook count table + path inventory
Postmortem written → add to postmortem index
Codex/round-table consultation produces a new mechanism ranking → update transformation-timeline section
7+ days of metrics data → fill in actual numbers under "metrics to watch"