/opt/agent/scripts/arg_policy_hook.py is a PreToolUse hook wired into ~/.claude/settings.json. Every tool call goes through it before execution. The hook reads policies/boundaries.json at runtime (no Claude Code restart needed when boundaries change).
| Decision | Hook behavior |
|---|---|
deny |
exit 2, reason printed to stderr (model sees it) |
require_approval |
exit 2, reason printed (Jonah must enable explicitly via direct ask) |
require_account_check |
exit 0, stderr advisory (caller is reminded to verify) |
deny_unless_brian_account |
exit 2 IF call clearly targets Jonah's personal asset |
deny_unless_inbox |
exit 2 IF Write target is /root/.claude/system/ and not under /observability/inbox/ AND actor is not main Brian |
code_enforced |
exit 0 (already handled in code, e.g. wa_send_guard.py for WhatsApp) |
audit |
exit 0 + journal an event |
warn / advisory |
exit 0 + stderr nudge |
{"tool_name": "...", "tool_input": {...}}.The hook detects "is this an outbound call to a paid-LLM endpoint with auth attached?" via heuristic substring matching. Naive substring matching is bypassable; the hardening normalizes input first.
normalize_for_matching()Applied in order:
U+200B–U+200D, BOM, word-joiner) — prevents api.anthropic.com evasion.api → api), ligatures, compatibility decompositions.https://api.anthropic.com/%76%31/messages.а U+0430 → a, ο U+03BF → o, etc.). Narrow mapping, only letters that appear in domains we care about.xn--... labels decoded so confusables/NFKC pass can reach them.Paid-LLM and money detection only fire on tools that can actually make HTTP calls:
OUTBOUND_TOOLS = Bash, WebFetch, mcp__fetch__fetch,
mcp__brightdata__*, mcp__firecrawl__*,
+ any tool whose name contains "browser",
"playwright", "claude-in-chrome", "chrome-devtools"
Source-editing tools (Write, Edit, Read, NotebookEdit) are excluded — otherwise the hook false-positives on documentation, test corpora, and source code that mentions paid-API URLs (this exact ARG doc set would otherwise be unwriteable).
A paid-LLM URL alone is not enough — the hook also requires auth material in the same call:
auth_signals = "bearer sk-", # Authorization: Bearer sk-...
"sk-ant-", "sk-proj-", "sk-or-", # provider key prefixes
"x-api-key", # Anthropic header (with or without colon)
"anthropic-api-key", # alternate
"api_key=sk-" # query-string form
Codex/Claude-Code CLIs invoked over Plus subscription auth (no API key) pass through cleanly.
/opt/agent/scripts/test_arg_policy_hook.py — 12 cases, all passing as of 2026-05-03:
| Case | Expected | Why |
|---|---|---|
Plain api.anthropic.com + bearer |
deny | baseline |
api.openai.com + bearer |
deny | baseline |
Cyrillic аpi.anthropic.com + bearer |
deny | confusables |
Fullwidth api.openai.com + bearer |
deny | NFKC |
URL-encoded /v1/ → /%76%31/ |
deny | URL-decode |
| Zero-width inserted in domain | deny | strip |
api.stripe.com/v1/charges |
require_approval | money policy |
Bare --model gpt-5.4 (no endpoint) |
allow | subscription auth |
docs.anthropic.com (no auth) |
allow | docs reference |
Plain Read of a file |
allow | not outbound |
mail.google.com/?authuser=jonahtebaa@gmail.com via browser tool |
deny | personal-account boundary |
Write source mentioning api hosts |
allow | outbound scoping |
Run: /opt/agent/venv/bin/python3 /opt/agent/scripts/test_arg_policy_hook.py. Expect 12/12 passed.
| Boundary id | Decision | Trigger |
|---|---|---|
boundary.no_paid_model_calls |
deny | outbound call to paid-LLM host + auth signal |
boundary.no_real_money_outflow_without_ask |
require_approval | outbound call matching money patterns (Stripe charges, payment intents) |
boundary.no_jonah_personal_gmail_via_browser |
require_approval | outbound browser tool + Jonah's gmail address present |
boundary.subagents_readonly_on_arg |
deny | Write/Edit under /root/.claude/system/ + not main Brian + not under inbox/ |
WA-recipient guard is code_enforced via wa_send_guard.py, not this hook.
The hook protects against honest mistakes, not adversarial bypass. The only consumer is Brian himself, prompted by Jonah. There is no third party crafting Cyrillic URLs to slip past the hook.
The hardening exists because:
1. Auto-generated tool calls can pick up encoding artifacts (URL escapes, NFKC dirtiness from copy-paste).
2. Future Phase 5 ships ARG as a public OSS template; threat model widens then.
What the hook does NOT defend against:
- IP-literal bypass (https://198.51.100.4/v1/messages) — would require maintaining a paid-provider IP allowlist; deferred.
- Provider DNS exfiltration via *.anthropic-mirror.example — requires curated denylist; deferred.
- Tool-chain injection (one tool's output becomes another's input bypassing the hook) — Claude Code runs every tool call through the hook independently, so this is not a real bypass.