← index2026-05-03 06:45 (Beirut)(backfill from DOCUMENTATION/)

10 — Roadmap

10 — Roadmap

Phase log

Phase Date What
1 2026-05-02 morning Skeleton + seed JSON files
2 2026-05-02 evening Structured registry + probes + resolver + journal + inbox + hook + belt+suspenders discovery
2.5 2026-05-02 evening (overnight) 28 audit fixes; ARG cron live (probe / autofix / heal-npx / journal-rotate); flap detection; endpoint+auth co-presence in policy hook; sanitizer idempotence-test mode
2.6 2026-05-02 → 03 overnight Smoke-test bug-fix wave (3 boundary semantics fixes); 23 new probes wired; Phase C atom audit; cleanup
5-Item-3 2026-05-03 ~03:00 Beirut jsonschema validation wired into arg validate (5 schema files + common refs)
5-Item-4 2026-05-03 ~03:15 Beirut Hook bypass hardening (NFKC + confusables + URL-decode + IDN + zero-width); outbound-tool scoping; 12-case bypass corpus
5-Item-7 2026-05-03 ~03:30 Beirut Per-category invariants (critical-needs-probe, grant-parties-resolve, money-cap-needs-money-policy, deny_unless_brian_account scope sanity)
Docs 2026-05-03 ~03:40 Beirut This documentation set written to DOCUMENTATION/ARG/

Phases 1–2 + 2.5/2.6 + 5 items 3/4/7 all shipped. Smoke tests passing; status was 51/0/6/210 at handoff start, 71/0/13/183 after all overnight work, with the +7 reds being honest probe-surfaced gaps (Meta rate-limit, Reddit creds gap, Mac CDP transient).

Next (not yet started)

Phase 3 — broader hook coverage + smarter autofix

Goals:
- Expand boundary hook coverage (more rules with hard enforcement).
- Idempotency-aware autofix runners (currently a flat list; could be per-class with retry-window heuristics).
- Add arg remove subcommand for clean row deletion (today: manual JSON edit).

Phase 4 — capability miner full loop

Goals:
- Miner is wired but conservative. Move it from "drop into inbox with approval_required=True" to "auto-promote when N+ successful resolves, K+ days steady, no policy_block events."
- Reverse-direction miner: surface caps in registry that have NEVER resolved successfully (dead caps).

Phase 5 — Sanitize + ship as OSS template

The big one. Items 3 (jsonschema) + 4 (bypass-hardening) + 7 (invariants) already landed; remaining:
- Round-table review (in flight as of this doc — see ROUND_TABLE_REVIEW.md once complete).
- Pick license (MIT confirmed by Jonah 2026-05-03).
- Write a 1-page README aimed at outsiders.
- Push sanitized tree to a public repo.
- Optional: a 5-min Loom-style walkthrough video.

Sanitizer is idempotent. Schemas exist. Threat model documented. Most of the work for Phase 5 is presentation, not code.

Deferred items (not in active roadmap)

Item Why not now
IP-literal allowlist for paid providers Threat model doesn't justify the maintenance burden of keeping IP lists fresh.
Adversarial DNS denylist (*.anthropic-mirror.example) Same reason — internal threat model only.
Multi-writer support Would require locking + transactions; single-writer invariant is a feature, not a limitation, at current scale.
Vector-search over events journal Current arg events grep is fast (gzip-aware). At >100k events/day, revisit.
Web UI A markdown view + CLI is fine for one-agent scale.
ARG-as-MCP-server Would let other CLAUDE Code projects share the registry. Defer until at least 2 projects need it.

Success criteria

ARG is "done" when:
- ✅ Brian has a deterministic answer to "can I do X?" before attempting X.
- ✅ Brian's hard rules are enforced at runtime, not just documented.
- ✅ The registry survives a fresh session start without re-explanation.
- ✅ Sub-agents can propose changes without breaking the single-writer invariant.
- ✅ Probes verify reality on a cadence; flap detection prevents storms.
- ✅ Bypass attempts via Unicode tricks fail.
- ⏳ The system can be sanitized and shipped as a template another agent could fork.

The last criterion is what Phase 5 closes.