Objective

Treat retrieval as a first-class system: ingestion, chunking, routing, and evaluation—not only “top‑k cosine similarity.” This hub connects to the RAG in production essay and expands context via retrieval theme page.

Design checkpoints

  1. Chunk boundaries preserve semantics (tables, definitions, sections).
  2. Corpus versions and timestamps are traceable through to answers.
  3. Queries route to retrieval vs. direct completion when appropriate.
  4. Evaluation includes “no good chunk” and “stale authority” cases—see evaluation hub.

Cross-track links

Prompt templates that list only valid citation IDs belong to prompt experiment. Operational logging for replay sits in operations. Readers following Path B hit this track before looping back to evaluation.

Long read

For chunking trade-offs, freshness pipelines, routing, and rehearsed failure modes, read RAG in production: chunking, freshness, and failure modes. The retrieval systems theme page stays useful as a checklist after the essay.

Ingestion and evaluation hooks

Version corpora and store source timestamps; pair vector search with metadata filters when users ask for “latest” or region-specific policy. Wire empty-result rate, stale-citation probes, and ambiguous-query strata into the same evaluation philosophy as evaluation hub—offline scores alone will not catch authority drift.

When things go quiet

Support tickets about “wrong but plausible” answers often trace to chunk boundaries or partial re-indexes, not model swaps. Log hashed retrieval sets per request so postmortems can replay with the same IDs and prompt versions.

Beyond dense retrieval

Keyword or sparse retrieval still wins for exact SKUs, legal citations, and proper nouns; hybrid pipelines (dense + BM25 + metadata filters) are common in production. Rerankers add latency—budget them for high-stakes queries or after recall produces a manageable candidate set.

Access control and tenancy

Multi-tenant products must filter chunks by user or team before the model sees them. Tests should include “must not retrieve another tenant’s documents” as a first-class failure mode, alongside relevance metrics.