skip to content
Skylar Payne

Series

Practical AI Evals

A reading path for teams that need AI quality to become inspectable, fixable, and steadily better. The point is not to worship a metric. The point is to build the loop that tells you what broke, why, and what to change next.

Who this is for

Recommended order

  1. TIL · ai · evals · ai reliability

    Logs tell you what happened. Annotations tell you what it meant, why it failed, and whether the fix helped.

  2. TIL · ai · agents · evals

    One giant prompt can hide five separate jobs. Split the work so each part has a smaller contract and a failure you can actually name.

  3. TIL · ai · rag · evals

    A bad RAG answer does not tell you whether retrieval failed, generation failed, or the product asked an impossible question. Split the blame before fixing anything.

  4. TIL · ai · evals · ai reliability

    If an AI answer goes sideways and you cannot see the prompt, model, latency, tokens, retrieved context, and failure path, you are debugging from vibes.

  5. TIL · ai · evals · ai reliability

    If every eval run emails a customer, updates production state, or fires a webhook, you do not have an eval harness. You have a hostage situation.

  6. TIL · ai · rag · evals

    Before blaming the model, inspect the chunks. Duplicate, empty, bloated, or low-signal chunks can wreck retrieval quietly.

  7. TIL · ai · evals · ai reliability

    When a multi-step AI run fails once and then refuses to fail again, replay beats superstition. Capture the calls, context, and intermediate state.

  8. TIL · ai · rag · evals

    A citation is not proof just because the model printed a source name. Verify that the source exists and actually supports the claim.

  9. Essay · ai · evals · engineering

    A streamlined system for AI evaluation that closes the gap between seeing problems and fixing them.

  10. Essay · ai · rag · evals

    How Frigade Slashed Latency & Boosted User Helpfulness

  11. Essay · ai · flywheel · evaluation

  12. Essay · ai · evals · flywheel

    An AI Maturity Model

  13. Essay · ai · evals

    A Practical Guide to Evaluation-Driven Improvement

If you only read one thing

Start with the piece that turns evals from a measurement project into a product operating loop.

Where to go next

Use the AI Evals topic hub for concepts, TILs, and future case-study links.