Who this path serves
Engineers who can ship code but are still calibrating how LLM behavior fits into CI/CD and product promises. You will establish interfaces (prompts, schemas) before you scale traffic, then layer measurement that matches those interfaces.
Step 1 — Prompts as systems
Start with the prompt experiment hub, then read Prompt engineering as interface design. Supplement with prompts theme page for vocabulary. Goal: versioned artifacts and validators feel normal before you add model churn.
Step 2 — Evaluation & risk
Open evaluation experiment hub, then Beyond accuracy. Tie contracts from Step 1 directly to test cases. Use measurement theme for drift and rubric notes.
Step 3 — Optional retrieval primer
If your product will use documents soon, preview RAG experiment without diving into the full essay yet—or switch to Path B if retrieval is already on fire.
Related
Cost & latency when token budgets bite; experiments hub for the big picture.
Pacing
Expect roughly one focused week per major step if you are reading and sketching artifacts—not passive skimming. Prompt interfaces and evaluation contracts pay off fastest when you apply them to one real surface in your product rather than abstract exercises.
Other paths
After this sequence, Path B (Data-heavy) goes deep on retrieval first; Path C is for cross-team release checklists. The reading paths hub compares all three.
Concrete outputs
By the end of Step 2, aim for a written contract list, a linked CI job or script for smoke tests, and a single place (dashboard row or spreadsheet) for weekly rubric sampling—even if rough. Artifacts matter more than reading speed.