Series

Practical AI Evals

A reading path for teams that need AI quality to become inspectable, fixable, and steadily better. The point is not to worship a metric. The point is to build the loop that tells you what broke, why, and what to change next.

Who this is for

Engineering leaders trying to make AI quality less mysterious.
Product teams shipping agents, RAG, onboarding assistants, copilots, or workflow automation.
Builders who have traces, dashboards, and prompts, but not a reliable improvement loop yet.

Recommended order

May 28, 2026

TIL: Annotate AI Traces

TIL · ai · evals · ai reliability

Logs tell you what happened. Annotations tell you what it meant, why it failed, and whether the fix helped.
May 28, 2026

TIL: Break AI Workflows Into Parts You Can Grade

TIL · ai · agents · evals

One giant prompt can hide five separate jobs. Split the work so each part has a smaller contract and a failure you can actually name.
May 28, 2026

TIL: Evaluate RAG Retrieval Separately

TIL · ai · rag · evals

A bad RAG answer does not tell you whether retrieval failed, generation failed, or the product asked an impossible question. Split the blame before fixing anything.
May 28, 2026

TIL: Instrument AI Calls Before You Debug

TIL · ai · evals · ai reliability

If an AI answer goes sideways and you cannot see the prompt, model, latency, tokens, retrieved context, and failure path, you are debugging from vibes.
May 28, 2026

TIL: Make AI Pipelines Safe to Replay

TIL · ai · evals · ai reliability

If every eval run emails a customer, updates production state, or fires a webhook, you do not have an eval harness. You have a hostage situation.
May 28, 2026

TIL: Quality Check RAG Chunks

TIL · ai · rag · evals

Before blaming the model, inspect the chunks. Duplicate, empty, bloated, or low-signal chunks can wreck retrieval quietly.
May 28, 2026

TIL: Record and Replay AI Workflows

TIL · ai · evals · ai reliability

When a multi-step AI run fails once and then refuses to fail again, replay beats superstition. Capture the calls, context, and intermediate state.
May 28, 2026

TIL: Validate RAG Citations

TIL · ai · rag · evals

A citation is not proof just because the model printed a source name. Verify that the source exists and actually supports the claim.
Mar 17, 2026

Evals That Actually Get Used

Essay · ai · evals · engineering

A streamlined system for AI evaluation that closes the gap between seeing problems and fixing them.
May 20, 2025

40% Better, 75% Faster

Essay · ai · rag · evals

How Frigade Slashed Latency & Boosted User Helpfulness
Mar 26, 2025

Quality Assurance for AI

Essay · ai · flywheel · evaluation
Feb 13, 2025

Why Most Companies Fail to Build Strategic Assets with AI

Essay · ai · evals · flywheel

An AI Maturity Model
Feb 8, 2025

The Art of Iterative AI System Development

Essay · ai · evals

A Practical Guide to Evaluation-Driven Improvement

If you only read one thing

Start with the piece that turns evals from a measurement project into a product operating loop.

Where to go next

Use the AI Evals topic hub for concepts, TILs, and future case-study links.