← index2026-05-03 06:42 (Beirut)(backfill from DOCUMENTATION/)

05 — Policy Hook (Runtime Boundary Enforcement)

What it does

/opt/agent/scripts/arg_policy_hook.py is a PreToolUse hook wired into ~/.claude/settings.json. Every tool call goes through it before execution. The hook reads policies/boundaries.json at runtime (no Claude Code restart needed when boundaries change).

Decisions

Decision	Hook behavior
`deny`	exit 2, reason printed to stderr (model sees it)
`require_approval`	exit 2, reason printed (Jonah must enable explicitly via direct ask)
`require_account_check`	exit 0, stderr advisory (caller is reminded to verify)
`deny_unless_brian_account`	exit 2 IF call clearly targets Jonah's personal asset
`deny_unless_inbox`	exit 2 IF Write target is `/root/.claude/system/` and not under `/observability/inbox/` AND actor is not main Brian
`code_enforced`	exit 0 (already handled in code, e.g. `wa_send_guard.py` for WhatsApp)
`audit`	exit 0 + journal an event
`warn` / `advisory`	exit 0 + stderr nudge

Hook spec (Claude Code contract)

Exit 0 = allow, optional stderr advisory.
Exit 2 = block; stderr is shown to the model.
Hook receives full tool payload on stdin: {"tool_name": "...", "tool_input": {...}}.

Bypass-hardened detection (Item 4, landed 2026-05-03)

The hook detects "is this an outbound call to a paid-LLM endpoint with auth attached?" via heuristic substring matching. Naive substring matching is bypassable; the hardening normalizes input first.

`normalize_for_matching()`

Applied in order:

Strip zero-width formatting characters (U+200B–U+200D, BOM, word-joiner) — prevents api.anthropic.com evasion.
NFKC normalization — collapses fullwidth Latin (ａｐｉ → api), ligatures, compatibility decompositions.
URL-decode (one pass) — catches https://api.anthropic.com/%76%31/messages.
Confusables map — Cyrillic / Greek lookalikes mapped to Latin (а U+0430 → a, ο U+03BF → o, etc.). Narrow mapping, only letters that appear in domains we care about.
IDN punycode decode — xn--... labels decoded so confusables/NFKC pass can reach them.
Lowercase.

Outbound-tool scoping

Paid-LLM and money detection only fire on tools that can actually make HTTP calls:

OUTBOUND_TOOLS = Bash, WebFetch, mcp__fetch__fetch,
                 mcp__brightdata__*, mcp__firecrawl__*,
                 + any tool whose name contains "browser",
                   "playwright", "claude-in-chrome", "chrome-devtools"

Source-editing tools (Write, Edit, Read, NotebookEdit) are excluded — otherwise the hook false-positives on documentation, test corpora, and source code that mentions paid-API URLs (this exact ARG doc set would otherwise be unwriteable).

Auth-signal co-presence

A paid-LLM URL alone is not enough — the hook also requires auth material in the same call:

auth_signals = "bearer sk-",          # Authorization: Bearer sk-...
               "sk-ant-", "sk-proj-", "sk-or-",   # provider key prefixes
               "x-api-key",            # Anthropic header (with or without colon)
               "anthropic-api-key",    # alternate
               "api_key=sk-"           # query-string form

Codex/Claude-Code CLIs invoked over Plus subscription auth (no API key) pass through cleanly.

Bypass test corpus

/opt/agent/scripts/test_arg_policy_hook.py — 12 cases, all passing as of 2026-05-03:

Case	Expected	Why
Plain `api.anthropic.com` + bearer	deny	baseline
`api.openai.com` + bearer	deny	baseline
Cyrillic `аpi.anthropic.com` + bearer	deny	confusables
Fullwidth `ａpi.openai.com` + bearer	deny	NFKC
URL-encoded `/v1/` → `/%76%31/`	deny	URL-decode
Zero-width inserted in domain	deny	strip
`api.stripe.com/v1/charges`	require_approval	money policy
Bare `--model gpt-5.4` (no endpoint)	allow	subscription auth
`docs.anthropic.com` (no auth)	allow	docs reference
Plain `Read` of a file	allow	not outbound
`mail.google.com/?authuser=jonahtebaa@gmail.com` via browser tool	deny	personal-account boundary
`Write` source mentioning api hosts	allow	outbound scoping

Run: /opt/agent/venv/bin/python3 /opt/agent/scripts/test_arg_policy_hook.py. Expect 12/12 passed.

What the hook actively enforces (as of 2026-05-03)

Boundary id	Decision	Trigger
`boundary.no_paid_model_calls`	deny	outbound call to paid-LLM host + auth signal
`boundary.no_real_money_outflow_without_ask`	require_approval	outbound call matching money patterns (Stripe charges, payment intents)
`boundary.no_jonah_personal_gmail_via_browser`	require_approval	outbound browser tool + Jonah's gmail address present
`boundary.subagents_readonly_on_arg`	deny	Write/Edit under `/root/.claude/system/` + not main Brian + not under inbox/

WA-recipient guard is code_enforced via wa_send_guard.py, not this hook.

Threat model — explicit

The hook protects against honest mistakes, not adversarial bypass. The only consumer is Brian himself, prompted by Jonah. There is no third party crafting Cyrillic URLs to slip past the hook.

The hardening exists because:
1. Auto-generated tool calls can pick up encoding artifacts (URL escapes, NFKC dirtiness from copy-paste).
2. Future Phase 5 ships ARG as a public OSS template; threat model widens then.

What the hook does NOT defend against:
- IP-literal bypass (https://198.51.100.4/v1/messages) — would require maintaining a paid-provider IP allowlist; deferred.
- Provider DNS exfiltration via *.anthropic-mirror.example — requires curated denylist; deferred.
- Tool-chain injection (one tool's output becomes another's input bypassing the hook) — Claude Code runs every tool call through the hook independently, so this is not a real bypass.