← index2026-05-12 13:49 (Beirut)Webspot Proposal Agent — what it does, how to use it, how it works (ingest → retrieve → generate → audit). Note: Google Slides, not Canva.

Webspot Proposal Agent — Full Guide

Webspot Proposal Agent — Full Guide

What it is: a no-paid-model agent that takes a client brief, retrieves the closest past Webspot proposals, generates pricing + content, and outputs a finished branded PDF proposal by editing a copy of the master Google Slides deck. (Note: the design layer is Google Slides, not Canva — Webspot's existing 54-slide A4 portrait master is the source of visual fidelity.)

Where it lives: /opt/agent/webspot_proposal_agent/ on Hetzner (ubuntu-8gb-hel1-1).

Status (2026-05-12): ingestion live (251 proposals indexed), master deck built with anchor placeholders, end-to-end generation proven on the ABC Store client (4 rounds, R4 = clean).


How to use it

1. Write the brief (intent doc)

Drop a markdown file describing the client and services into data/. Example: data/abc_store_brief_intent.md. It must include:

cd /opt/agent/webspot_proposal_agent
./venv/bin/python cli.py search "AI training + customer service agent for retail"
./venv/bin/python cli.py search "ecommerce site rebuild" --json

Returns top-5 closest past proposals + top-12 most relevant sections (pricing, scope, deliverables, etc.). These become reference material when drafting.

3. Generate the proposal

./venv/bin/python scripts/06_generate_abc_store_v2.py <round_number>

What that script does, in order:

  1. Copies WEBSPOT_PROPOSAL_MASTER_v1 (Slides ID 1PlMaMn2sOAkqy1GJNnsj292zShEolrCgHLezCVopSFA).
  2. Reads the keep-set: which of the 54 master slides apply to this brief.
  3. Deletes the rest.
  4. For each kept slide, fully replaces text-frame content via the Slides API (delete-old → insert-new → re-apply original first-run style — preserves fonts, sizes, colors, spacing).
  5. Exports as PDF via Drive API.
  6. Renders each PDF page to PNG at 110 DPI for audit.
  7. Writes a manifest JSON: data/generated_drafts/YYYY-MM-DD_<client>_proposal_r<N>_manifest.json.

4. Audit each round

./venv/bin/python scripts/07_audit_abc_store.py <round_number>

Extracts text, flags banned phrases (other client names, off-brief services like "ComfyUI workshop"), runs visual checks, writes AUDIT.md. Loop rounds until clean.

5. Deliver

PDF lands in data/generated_drafts/ and is mirrored to the BRIAN SHARED Drive folder so it appears on Jonah's Mac within ~4s.

Other CLI commands

./venv/bin/python cli.py ingest                    # full Drive walk (idempotent)
./venv/bin/python cli.py ingest --limit 10         # smoke test
./venv/bin/python cli.py report                    # re-emit INGESTION_REPORT.md
./venv/bin/python cli.py notify                    # send TG COMMS completion notice

How it works

Three-stage pipeline

Drive corpus  →  Retrieval (Qdrant)  →  Generation (Slides API)  →  PDF
   (275 PDFs)      (bge-m3 local)         (anchor replacement)       (audit)

Stage A — Ingestion (cli.py ingest)

Step File Detail
Walk Drive ingest/drive_ingest.py Recursive via service account webspot-proposal-agent@gen-lang-client-0765538237.iam.gserviceaccount.com. Skips trash, demo, price card, _*, *old*, *template_test*.
Parse PDFs ingest/pdf_parser.py PyMuPDF first; OCR (ocrmypdf) fallback when text density low. ~8.8% OCR rate.
Classify pages ingest/section_classifier.py Regex heuristic — cover / intro / scope / pricing / timeline / deliverables / terms / contact / case_study / about / other.
Build canonical JSON ingest/canonical.py One JSON per proposal in data/canonical/.
Embed rag/embed.py Local bge-m3 via sentence-transformers (normalized vectors, no paid API).
Upsert rag/qdrant_client.py 5 Qdrant collections (namespaced webspot_proposal_*).

Five Qdrant collections on the shared agent-qdrant container (127.0.0.1:6333):

Collection Granularity Status
webspot_proposal_summaries 1 vec / proposal populated
webspot_proposal_sections 1 vec / section populated (primary retrieval unit)
webspot_proposal_blocks 1 vec / block (pricing/scope/terms/deliverables only) populated
webspot_proposal_edit_diffs created, Week 4 work
webspot_proposal_style_rules created, Week 4 work

rag/retrieve.py returns top-5 proposals + top-12 sections per query.

Stage C — Generation (scripts/06_generate_abc_store_v2.py)

The master is a real Google Slides file with anchor strings sprinkled in: {{PROPOSAL_TITLE}}, {{CLIENT_NAME}}, {{PROPOSAL_DATE}}, {{CLIENT_LOCATION}}, {{CLIENT_AUDIENCE}}, {{SERVICE_DESCRIPTION}} (×24 occurrences), and many more — built by scripts/03b_build_master_via_rclone_token.py from the original PPTX.

Why rclone's OAuth token, not the service account? The SA has no Drive quota and is only read-level on the folder; rclone is authenticated as Jonah (the owner), so it can upload + convert PPTX → native Slides + edit freely. The SA is then added as Editor so future generator runs work headless.

Full text-frame replacement (v2 strategy) avoids the partial-anchor smashing problem ("GENERATIONWORKSHOP") that the earlier replaceAllText-only path produced. For every slide we author content for, the script:

  1. Reads the existing text frame's first-run style.
  2. Deletes the entire frame contents.
  3. Inserts the new generated text.
  4. Re-applies the captured style.

Then exports to PDF via drive.files.export(mimeType="application/pdf").

Stage D — Visual fidelity check

scripts/04_visual_fidelity.py (+ 04b streaming and 04c fast variants): renders the new Slides → PDF, diffs page-by-page against the original PPTX-converted PDF using a perceptual hash (8×8 average-hash) to flag drift between master and the new template. Used once at template-build time, not per-client.

Stage E — Per-round audit

scripts/07_audit_abc_store.py extracts text per page and runs:

Writes AUDIT.md per round. Loop until clean.


Current numbers (post-ingestion, 2026-05-10)


Files map

Path Purpose
cli.py ingest / search / report / notify
ingest/drive_ingest.py Drive walker (service account)
ingest/pdf_parser.py PyMuPDF + ocrmypdf fallback
ingest/section_classifier.py Per-page regex classifier
ingest/canonical.py Canonical JSON builder
rag/embed.py Local bge-m3 embeddings
rag/qdrant_client.py 5-collection upserter
rag/retrieve.py Top-5 proposals + top-12 sections
scripts/01_analyze_master.py Deep PPTX analysis → master_analysis.json
scripts/03b_build_master_via_rclone_token.py PPTX → native Google Slides w/ anchors (live builder)
scripts/04*.py Visual-fidelity checks (3 variants)
scripts/05_generate_abc_store.py First-gen generator (v1, anchor-only)
scripts/06_generate_abc_store_v2.py Current generator — full text-frame replacement, round-aware
scripts/07_audit_abc_store.py Per-round audit + PNG renders
data/canonical/ One JSON per proposal (251 files)
data/pdf_cache/ Mirrored Drive structure, raw PDFs
data/page_library.yaml Routing index — which layouts always include, which gate on keywords
data/master_build.json Live master Slides ID + anchor occurrence map
data/INGESTION_REPORT.md Stats + confidence flags
data/generated_drafts/ Output PDFs + manifests, one set per round
logs/ingest.log Append-only ingestion log

Hard constraints


Known gaps / next work


Trigger words / one-liners


Index location: /opt/agent/webspot_proposal_agent/ · master Slides: 1PlMaMn2sOAkqy1GJNnsj292zShEolrCgHLezCVopSFA · Drive folder: 1y0jWe8aNTVMGWihwMhAk8oQLf7UquRQA (WEBSPOT | PROPOSALS).