What it is: a no-paid-model agent that takes a client brief, retrieves the closest past Webspot proposals, generates pricing + content, and outputs a finished branded PDF proposal by editing a copy of the master Google Slides deck. (Note: the design layer is Google Slides, not Canva — Webspot's existing 54-slide A4 portrait master is the source of visual fidelity.)
Where it lives: /opt/agent/webspot_proposal_agent/ on Hetzner (ubuntu-8gb-hel1-1).
Status (2026-05-12): ingestion live (251 proposals indexed), master deck built with anchor placeholders, end-to-end generation proven on the ABC Store client (4 rounds, R4 = clean).
Drop a markdown file describing the client and services into data/. Example: data/abc_store_brief_intent.md. It must include:
cd /opt/agent/webspot_proposal_agent
./venv/bin/python cli.py search "AI training + customer service agent for retail"
./venv/bin/python cli.py search "ecommerce site rebuild" --json
Returns top-5 closest past proposals + top-12 most relevant sections (pricing, scope, deliverables, etc.). These become reference material when drafting.
./venv/bin/python scripts/06_generate_abc_store_v2.py <round_number>
What that script does, in order:
WEBSPOT_PROPOSAL_MASTER_v1 (Slides ID 1PlMaMn2sOAkqy1GJNnsj292zShEolrCgHLezCVopSFA).data/generated_drafts/YYYY-MM-DD_<client>_proposal_r<N>_manifest.json../venv/bin/python scripts/07_audit_abc_store.py <round_number>
Extracts text, flags banned phrases (other client names, off-brief services like "ComfyUI workshop"), runs visual checks, writes AUDIT.md. Loop rounds until clean.
PDF lands in data/generated_drafts/ and is mirrored to the BRIAN SHARED Drive folder so it appears on Jonah's Mac within ~4s.
./venv/bin/python cli.py ingest # full Drive walk (idempotent)
./venv/bin/python cli.py ingest --limit 10 # smoke test
./venv/bin/python cli.py report # re-emit INGESTION_REPORT.md
./venv/bin/python cli.py notify # send TG COMMS completion notice
Drive corpus → Retrieval (Qdrant) → Generation (Slides API) → PDF
(275 PDFs) (bge-m3 local) (anchor replacement) (audit)
cli.py ingest)| Step | File | Detail |
|---|---|---|
| Walk Drive | ingest/drive_ingest.py |
Recursive via service account webspot-proposal-agent@gen-lang-client-0765538237.iam.gserviceaccount.com. Skips trash, demo, price card, _*, *old*, *template_test*. |
| Parse PDFs | ingest/pdf_parser.py |
PyMuPDF first; OCR (ocrmypdf) fallback when text density low. ~8.8% OCR rate. |
| Classify pages | ingest/section_classifier.py |
Regex heuristic — cover / intro / scope / pricing / timeline / deliverables / terms / contact / case_study / about / other. |
| Build canonical JSON | ingest/canonical.py |
One JSON per proposal in data/canonical/. |
| Embed | rag/embed.py |
Local bge-m3 via sentence-transformers (normalized vectors, no paid API). |
| Upsert | rag/qdrant_client.py |
5 Qdrant collections (namespaced webspot_proposal_*). |
cli.py search)Five Qdrant collections on the shared agent-qdrant container (127.0.0.1:6333):
| Collection | Granularity | Status |
|---|---|---|
webspot_proposal_summaries |
1 vec / proposal | populated |
webspot_proposal_sections |
1 vec / section | populated (primary retrieval unit) |
webspot_proposal_blocks |
1 vec / block (pricing/scope/terms/deliverables only) | populated |
webspot_proposal_edit_diffs |
— | created, Week 4 work |
webspot_proposal_style_rules |
— | created, Week 4 work |
rag/retrieve.py returns top-5 proposals + top-12 sections per query.
scripts/06_generate_abc_store_v2.py)The master is a real Google Slides file with anchor strings sprinkled in: {{PROPOSAL_TITLE}}, {{CLIENT_NAME}}, {{PROPOSAL_DATE}}, {{CLIENT_LOCATION}}, {{CLIENT_AUDIENCE}}, {{SERVICE_DESCRIPTION}} (×24 occurrences), and many more — built by scripts/03b_build_master_via_rclone_token.py from the original PPTX.
Why rclone's OAuth token, not the service account? The SA has no Drive quota and is only read-level on the folder; rclone is authenticated as Jonah (the owner), so it can upload + convert PPTX → native Slides + edit freely. The SA is then added as Editor so future generator runs work headless.
Full text-frame replacement (v2 strategy) avoids the partial-anchor smashing problem ("GENERATIONWORKSHOP") that the earlier replaceAllText-only path produced. For every slide we author content for, the script:
Then exports to PDF via drive.files.export(mimeType="application/pdf").
scripts/04_visual_fidelity.py (+ 04b streaming and 04c fast variants): renders the new Slides → PDF, diffs page-by-page against the original PPTX-converted PDF using a perceptual hash (8×8 average-hash) to flag drift between master and the new template. Used once at template-build time, not per-client.
scripts/07_audit_abc_store.py extracts text per page and runs:
Writes AUDIT.md per round. Loop until clean.
| Path | Purpose |
|---|---|
cli.py |
ingest / search / report / notify |
ingest/drive_ingest.py |
Drive walker (service account) |
ingest/pdf_parser.py |
PyMuPDF + ocrmypdf fallback |
ingest/section_classifier.py |
Per-page regex classifier |
ingest/canonical.py |
Canonical JSON builder |
rag/embed.py |
Local bge-m3 embeddings |
rag/qdrant_client.py |
5-collection upserter |
rag/retrieve.py |
Top-5 proposals + top-12 sections |
scripts/01_analyze_master.py |
Deep PPTX analysis → master_analysis.json |
scripts/03b_build_master_via_rclone_token.py |
PPTX → native Google Slides w/ anchors (live builder) |
scripts/04*.py |
Visual-fidelity checks (3 variants) |
scripts/05_generate_abc_store.py |
First-gen generator (v1, anchor-only) |
scripts/06_generate_abc_store_v2.py |
Current generator — full text-frame replacement, round-aware |
scripts/07_audit_abc_store.py |
Per-round audit + PNG renders |
data/canonical/ |
One JSON per proposal (251 files) |
data/pdf_cache/ |
Mirrored Drive structure, raw PDFs |
data/page_library.yaml |
Routing index — which layouts always include, which gate on keywords |
data/master_build.json |
Live master Slides ID + anchor occurrence map |
data/INGESTION_REPORT.md |
Stats + confidence flags |
data/generated_drafts/ |
Output PDFs + manifests, one set per round |
logs/ingest.log |
Append-only ingestion log |
webspot_proposal_edit_diffs + webspot_proposal_style_rules collections (currently created but empty). Lets the generator learn from Jonah's edits across rounds.page_library.yaml routing scoring is keyword-only — fine for now; upgrade to embedding-similarity scoring when corpus grows.scripts/06_generate_abc_store_v2.py). Generalize into scripts/generate.py <brief.md> <round> once the second client lands.cli.py ingest (idempotent, picks up new files only).cli.py search "X".Index location: /opt/agent/webspot_proposal_agent/ · master Slides: 1PlMaMn2sOAkqy1GJNnsj292zShEolrCgHLezCVopSFA · Drive folder: 1y0jWe8aNTVMGWihwMhAk8oQLf7UquRQA (WEBSPOT | PROPOSALS).