*/30 * * * * /opt/agent/scripts/heal_npx_cache.sh # sweep half-renamed npx cache dirs
0,30 * * * * /opt/agent/scripts/arg_probe_critical.sh # probe critical entries
15,45 * * * * /opt/agent/scripts/arg_autofix.py # auto-fix safe failure classes
10 4 * * * /opt/agent/scripts/arg_rotate_events.sh # gzip>7d, delete>90d
0 2 * * * # GEO hour (existing routine, not ARG-specific)
All on Hetzner host ubuntu-8gb-hel1-1. Cron is crontab -l for root.
The :00/:30 probe run completes before the :15/:45 autofix run, so autofix always sees fresh last_probe data. Probe + autofix complete in <30s typical (probe 16–28s, autofix 1–4s).
If a probe storm pushes runtime over 60s, the next half-hour fires anyway — probes are idempotent and autofix has flap detection, so overlapping cycles don't compound.
Some upstreams flag rapid probes. Currently rate-limited at the row level:
| Atom | min_interval_seconds | Source |
|---|---|---|
| LI account + cookies | 3600 | Phase 2 (LI session-flagging) |
| Meta page + IG + tokens | 3600 | Phase 2.6 (added 2026-05-03 after probe storm hit Meta app limit) |
Future probes that hit shared-quota providers should default min_interval_seconds: 3600 until proven otherwise.
/opt/agent/scripts/arg_sanitize.py produces a public-template tree at /tmp/arg-template-out/arg-template/:
/root/.claude/system/{resources,access,tools,capabilities,policies,routines}/*.json--idempotence-test flag: re-feeds output through sanitize(); asserts equality. Verified passing 2026-05-03.Replacement rules in REPLACEMENTS list — order matters (specific patterns first). Adding a new pattern: append to REPLACEMENTS, run --idempotence-test, fix any new leaks.
Three independent paths surface ARG to a fresh Claude Code session:
🚨 ARG (Agent Resource Graph)./opt/agent/scripts/arg_sessionstart.sh) — prints status counts + recent events to model context./ARG slash skill (/root/.claude/skills/ARG/SKILL.md) — load on demand.Discovery effectiveness reported in evening brief.
/opt/agent/scripts/capability_loader.sh is a UserPromptSubmit hook that auto-fires the using-capabilities skill when prompts exceed 80 characters. Forces a <capability-plan> block above the first tool call for any non-trivial action.
Weekly audit: /cap-refresh --weekly greps transcripts for tool calls in trigger scope without a preceding plan block; violations route to TG LOGS.
The ARG tree is included in:
- Nightly tarball at /opt/agent/backups/daily/*.tar.gz (via existing backup-daily cron).
- Off-host restic to sftp:contabo:/var/backups/hetzner-agent (nightly).
- Weekly cloud copy to /root/GoogleDrive/Mine/backups/hetzner_postgres/ Sundays.
Restoration: untar → arg validate → arg render-views. State (probe_status.json, autofix_state.json, events/) is restored along with the registry, so post-restore probes converge quickly.
| Symptom | Likely cause | Recovery |
|---|---|---|
arg validate returns N problems |
recent edit broke schema | git diff /root/.claude/system/, revert offending row |
| All probes returning ok=false | env file path wrong, /opt/agent/core/.env missing |
restore from /root/agent_config.env source-of-truth via SOPS decrypt |
| Autofix runs but never sticks | flap detection should trip at 5×; check autofix_state.json |
inspect last 50 events for the (rid, action) pair; root-cause the underlying failure |
| Sub-agent writes blocked unexpectedly | CLAUDE_ARG_MAIN=1 not set in main session |
confirm ~/.claude/settings.json has "CLAUDE_ARG_MAIN": "1" in env block |
arg events grep X returns nothing |
grep pattern needs unescaped quotes or wrong year | try without quotes: arg events grep policy_block |
| Status counts drift after manual edit | probe_status.json out of sync with row last_probe | arg probe-all resyncs |