/01
Corpus and chunking
Decide what is in scope, who can access it, and how it is chunked before touching a model.
- Source list
- Per-tenant scoping
- Chunk size
- Refresh cadence
A pre-build checklist for retrieval-augmented generation in a SaaS product: corpus, chunking, evals, citations, and cost.
A RAG plan you can execute without producing a fragile demo, with measurable accuracy from day one.
/01
Decide what is in scope, who can access it, and how it is chunked before touching a model.
/02
Build a small eval set from real questions and pin a per-user cost ceiling before launch.
Often Postgres with pgvector is enough at the start. Move to a dedicated store when query patterns demand it.
Run the eval set on every change and watch citation correctness, not just answer overlap.