← index2026-05-03 06:44 (Beirut)(backfill from DOCUMENTATION/)

08 — Cron Schedule & Lifecycle

08 — Cron Schedule & Lifecycle

Scheduled jobs (as of 2026-05-03)

*/30 * * * *   /opt/agent/scripts/heal_npx_cache.sh         # sweep half-renamed npx cache dirs
0,30 * * * *   /opt/agent/scripts/arg_probe_critical.sh     # probe critical entries
15,45 * * * *  /opt/agent/scripts/arg_autofix.py            # auto-fix safe failure classes
10 4 * * *     /opt/agent/scripts/arg_rotate_events.sh      # gzip>7d, delete>90d
0  2 * * *     # GEO hour (existing routine, not ARG-specific)

All on Hetzner host ubuntu-8gb-hel1-1. Cron is crontab -l for root.

Probe → autofix interleave

The :00/:30 probe run completes before the :15/:45 autofix run, so autofix always sees fresh last_probe data. Probe + autofix complete in <30s typical (probe 16–28s, autofix 1–4s).

If a probe storm pushes runtime over 60s, the next half-hour fires anyway — probes are idempotent and autofix has flap detection, so overlapping cycles don't compound.

Probe rate limiting

Some upstreams flag rapid probes. Currently rate-limited at the row level:

Atom min_interval_seconds Source
LI account + cookies 3600 Phase 2 (LI session-flagging)
Meta page + IG + tokens 3600 Phase 2.6 (added 2026-05-03 after probe storm hit Meta app limit)

Future probes that hit shared-quota providers should default min_interval_seconds: 3600 until proven otherwise.

Sanitizer lifecycle (pre-public-release)

/opt/agent/scripts/arg_sanitize.py produces a public-template tree at /tmp/arg-template-out/arg-template/:

Replacement rules in REPLACEMENTS list — order matters (specific patterns first). Adding a new pattern: append to REPLACEMENTS, run --idempotence-test, fix any new leaks.

Discovery layers

Three independent paths surface ARG to a fresh Claude Code session:

  1. MEMORY.md CORE pointer — always-loaded line: 🚨 ARG (Agent Resource Graph).
  2. SessionStart hook (/opt/agent/scripts/arg_sessionstart.sh) — prints status counts + recent events to model context.
  3. /ARG slash skill (/root/.claude/skills/ARG/SKILL.md) — load on demand.

Discovery effectiveness reported in evening brief.

Capability-first planning trigger

/opt/agent/scripts/capability_loader.sh is a UserPromptSubmit hook that auto-fires the using-capabilities skill when prompts exceed 80 characters. Forces a <capability-plan> block above the first tool call for any non-trivial action.

Weekly audit: /cap-refresh --weekly greps transcripts for tool calls in trigger scope without a preceding plan block; violations route to TG LOGS.

Backups

The ARG tree is included in:
- Nightly tarball at /opt/agent/backups/daily/*.tar.gz (via existing backup-daily cron).
- Off-host restic to sftp:contabo:/var/backups/hetzner-agent (nightly).
- Weekly cloud copy to /root/GoogleDrive/Mine/backups/hetzner_postgres/ Sundays.

Restoration: untar → arg validatearg render-views. State (probe_status.json, autofix_state.json, events/) is restored along with the registry, so post-restore probes converge quickly.

Failure modes & recovery

Symptom Likely cause Recovery
arg validate returns N problems recent edit broke schema git diff /root/.claude/system/, revert offending row
All probes returning ok=false env file path wrong, /opt/agent/core/.env missing restore from /root/agent_config.env source-of-truth via SOPS decrypt
Autofix runs but never sticks flap detection should trip at 5×; check autofix_state.json inspect last 50 events for the (rid, action) pair; root-cause the underlying failure
Sub-agent writes blocked unexpectedly CLAUDE_ARG_MAIN=1 not set in main session confirm ~/.claude/settings.json has "CLAUDE_ARG_MAIN": "1" in env block
arg events grep X returns nothing grep pattern needs unescaped quotes or wrong year try without quotes: arg events grep policy_block
Status counts drift after manual edit probe_status.json out of sync with row last_probe arg probe-all resyncs