← index2026-05-03 06:42 (Beirut)(backfill from DOCUMENTATION/)

05 — Policy Hook (Runtime Boundary Enforcement)

05 — Policy Hook (Runtime Boundary Enforcement)

What it does

/opt/agent/scripts/arg_policy_hook.py is a PreToolUse hook wired into ~/.claude/settings.json. Every tool call goes through it before execution. The hook reads policies/boundaries.json at runtime (no Claude Code restart needed when boundaries change).

Decisions

Decision Hook behavior
deny exit 2, reason printed to stderr (model sees it)
require_approval exit 2, reason printed (Jonah must enable explicitly via direct ask)
require_account_check exit 0, stderr advisory (caller is reminded to verify)
deny_unless_brian_account exit 2 IF call clearly targets Jonah's personal asset
deny_unless_inbox exit 2 IF Write target is /root/.claude/system/ and not under /observability/inbox/ AND actor is not main Brian
code_enforced exit 0 (already handled in code, e.g. wa_send_guard.py for WhatsApp)
audit exit 0 + journal an event
warn / advisory exit 0 + stderr nudge

Hook spec (Claude Code contract)

Bypass-hardened detection (Item 4, landed 2026-05-03)

The hook detects "is this an outbound call to a paid-LLM endpoint with auth attached?" via heuristic substring matching. Naive substring matching is bypassable; the hardening normalizes input first.

normalize_for_matching()

Applied in order:

  1. Strip zero-width formatting characters (U+200BU+200D, BOM, word-joiner) — prevents api.anth​ropic.com evasion.
  2. NFKC normalization — collapses fullwidth Latin (apiapi), ligatures, compatibility decompositions.
  3. URL-decode (one pass) — catches https://api.anthropic.com/%76%31/messages.
  4. Confusables map — Cyrillic / Greek lookalikes mapped to Latin (а U+0430 → a, ο U+03BF → o, etc.). Narrow mapping, only letters that appear in domains we care about.
  5. IDN punycode decodexn--... labels decoded so confusables/NFKC pass can reach them.
  6. Lowercase.

Outbound-tool scoping

Paid-LLM and money detection only fire on tools that can actually make HTTP calls:

OUTBOUND_TOOLS = Bash, WebFetch, mcp__fetch__fetch,
                 mcp__brightdata__*, mcp__firecrawl__*,
                 + any tool whose name contains "browser",
                   "playwright", "claude-in-chrome", "chrome-devtools"

Source-editing tools (Write, Edit, Read, NotebookEdit) are excluded — otherwise the hook false-positives on documentation, test corpora, and source code that mentions paid-API URLs (this exact ARG doc set would otherwise be unwriteable).

Auth-signal co-presence

A paid-LLM URL alone is not enough — the hook also requires auth material in the same call:

auth_signals = "bearer sk-",          # Authorization: Bearer sk-...
               "sk-ant-", "sk-proj-", "sk-or-",   # provider key prefixes
               "x-api-key",            # Anthropic header (with or without colon)
               "anthropic-api-key",    # alternate
               "api_key=sk-"           # query-string form

Codex/Claude-Code CLIs invoked over Plus subscription auth (no API key) pass through cleanly.

Bypass test corpus

/opt/agent/scripts/test_arg_policy_hook.py — 12 cases, all passing as of 2026-05-03:

Case Expected Why
Plain api.anthropic.com + bearer deny baseline
api.openai.com + bearer deny baseline
Cyrillic аpi.anthropic.com + bearer deny confusables
Fullwidth api.openai.com + bearer deny NFKC
URL-encoded /v1//%76%31/ deny URL-decode
Zero-width inserted in domain deny strip
api.stripe.com/v1/charges require_approval money policy
Bare --model gpt-5.4 (no endpoint) allow subscription auth
docs.anthropic.com (no auth) allow docs reference
Plain Read of a file allow not outbound
mail.google.com/?authuser=jonahtebaa@gmail.com via browser tool deny personal-account boundary
Write source mentioning api hosts allow outbound scoping

Run: /opt/agent/venv/bin/python3 /opt/agent/scripts/test_arg_policy_hook.py. Expect 12/12 passed.

What the hook actively enforces (as of 2026-05-03)

Boundary id Decision Trigger
boundary.no_paid_model_calls deny outbound call to paid-LLM host + auth signal
boundary.no_real_money_outflow_without_ask require_approval outbound call matching money patterns (Stripe charges, payment intents)
boundary.no_jonah_personal_gmail_via_browser require_approval outbound browser tool + Jonah's gmail address present
boundary.subagents_readonly_on_arg deny Write/Edit under /root/.claude/system/ + not main Brian + not under inbox/

WA-recipient guard is code_enforced via wa_send_guard.py, not this hook.

Threat model — explicit

The hook protects against honest mistakes, not adversarial bypass. The only consumer is Brian himself, prompted by Jonah. There is no third party crafting Cyrillic URLs to slip past the hook.

The hardening exists because:
1. Auto-generated tool calls can pick up encoding artifacts (URL escapes, NFKC dirtiness from copy-paste).
2. Future Phase 5 ships ARG as a public OSS template; threat model widens then.

What the hook does NOT defend against:
- IP-literal bypass (https://198.51.100.4/v1/messages) — would require maintaining a paid-provider IP allowlist; deferred.
- Provider DNS exfiltration via *.anthropic-mirror.example — requires curated denylist; deferred.
- Tool-chain injection (one tool's output becomes another's input bypassing the hook) — Claude Code runs every tool call through the hook independently, so this is not a real bypass.