← index2026-05-03 06:54 (Beirut)(backfill from DOCUMENTATION/)

Round-Table Review — ARG, pre-sanitize

Round-Table Review — ARG, pre-sanitize

Date: 2026-05-03 ~03:50 Beirut
Reviewers dispatched: codex (OpenAI Codex CLI), hermes (Nous), gemini (Google), jules, antigravity
Reviewers usable: codex (full review), hermes (partial; some hallucinations)
Reviewers failed: gemini (workspace sandbox blocked file access), jules (CLI invocation mismatch — fed prompt as command), antigravity (refused root with no sandbox flag)
Raw outputs: /tmp/arg_review/{codex,hermes,gemini,jules,antigravity}.md

Overall verdict

Codex: major-rework-needed before OSS publication.
Hermes: fix-P1-then-ship.
Brian's read on the consolidated set: fix-P0-and-key-P1-then-ship. Codex's P0 list is real and credible — every finding has a checkable file:line. Most are not deep architectural problems, just gaps that didn't surface in private use because there's only one operator. Estimated remediation: 2-3 hours.


P0 — must fix before sanitize+ship

1. Sanitizer misses account/opaque IDs (codex)

2. ARG docs are outside the sanitizer (codex)

3. Policy hook fails open if boundaries can't load (codex)

4. Sub-agents can bypass read-only ARG via CLI (codex)

5. Critical probe cron ignores freshness AND min_interval_seconds (codex)

6. Money hook misses money-capable MCP tools (codex)


P1 — should fix before ship

7. Hook is not actually data-driven (codex)

8. Probe command allowlist is shell-prefix based (codex + hermes)

9. Capability miner reads the wrong event shape (codex)

10. Events grep + miner not gzip-aware (codex)

11. risk_at_least uses equality (codex)

12. IDN hardening claim is not covered (codex)

13. Sanitized template is not runnable (codex)


P2 — could fix before ship

14. jsonschema validation silently disables itself (codex)

15. Hardcoded paths break OS portability (hermes — verified credible)


NIT

16. Schema versioning (hermes — verified credible)


Findings discarded (hermes)

Findings I intentionally did NOT pursue


Remediation plan

Order, with rough effort:

Order Item Effort Why first
1 P0 #3 — fail-closed policy hook 15 min bug, easy fix, biggest safety win
2 P0 #5 — probe-all freshness + min_interval cache 30 min prevents future probe-storm reds
3 P0 #4 — actor check in CLI mutators 20 min closes sub-agent escape hatch
4 P0 #6 — money-capable MCP detection 20 min future-proofs Stripe MCP wire-up
5 P1 #9 — miner event-shape fix 15 min Phase 4 currently broken; tiny fix
6 P1 #10 — gzip-aware grep + mine 15 min doc claim mismatch
7 P1 #11 — risk_at_least ordinal 10 min semantic correctness
8 P1 #12 — IDN normalization order + test case 20 min bypass corpus completeness
9 P1 #8 — probe allowlist hardening 30 min shell-injection surface narrows
10 P0 #1 + P1 #13 — sanitizer expansion 45 min the actual ship-blocker
11 P0 #2 — sanitize the docs (or write fresh OSS docs) 60 min last step before push
12 P1 #7 — generic boundary matching OR doc revision 45 min clarity
13 P2 #14, #15, #16 — polish 30 min optional

Total: ~6 hours to ship-ready state. Most items are 15-30 minutes each because the codebase is small and the bugs are surgical.

Recommendation

Do NOT sanitize+ship yet. Fix the 6 P0s + the top 4 P1s (#9, #10, #11, #12 — they're cheap and meaningful). The remaining P1s (#7, #8, #13) and the P2/NIT items can either land in a follow-up patch or be noted as "known limitations" in the OSS README.

Round-table consensus: ARG's design is sound. The implementation has cracks that only appeared because the system was built fast and tested by one operator. Surfacing them BEFORE shipping is exactly why we ran the round-table.