Topic hub
AI Reliability
Reliable AI features are mostly boring on purpose. The work usually sits near the boundary: traces, schemas, retries, evals, replay, and clear ownership for risky actions.
Start here
- →Give the unreliable part a stable interface.
- →Capture enough evidence to debug the next failure.
- →Validate outputs before they leak into product state.
- →Keep risky side effects behind explicit gates.
Related writing
No long-form pieces here yet. Start with the short notes.
TILs
-
TIL · ai · evals · ai reliability
Logs tell you what happened. Annotations tell you what it meant, why it failed, and whether the fix helped.
-
TIL · ai · rag · evals
A bad RAG answer does not tell you whether retrieval failed, generation failed, or the product asked an impossible question. Split the blame before fixing anything.
-
TIL · ai · evals · ai reliability
If an AI answer goes sideways and you cannot see the prompt, model, latency, tokens, retrieved context, and failure path, you are debugging from vibes.
-
TIL · ai · evals · ai reliability
If every eval run emails a customer, updates production state, or fires a webhook, you do not have an eval harness. You have a hostage situation.
-
TIL · ai · agents · ai reliability
Agents get less spooky when they have named states, constrained transitions, and a record of how each decision moved the process forward.
-
TIL · ai · agents · ai reliability
Agents should not get to delete files, send messages, spend money, publish content, or mutate production just because the next step looks obvious.
-
TIL · ai · rag · evals
Before blaming the model, inspect the chunks. Duplicate, empty, bloated, or low-signal chunks can wreck retrieval quietly.
-
TIL · ai · evals · ai reliability
When a multi-step AI run fails once and then refuses to fail again, replay beats superstition. Capture the calls, context, and intermediate state.
-
TIL · ai · rag · evals
A citation is not proof just because the model printed a source name. Verify that the source exists and actually supports the claim.
-
TIL · ai · ai reliability · ai platforms
If the rest of your app needs data, make the model return data. Do not make downstream code scrape nice-sounding paragraphs forever.
-
TIL · ai · ai reliability · ai platforms
AI features get scary when prompts, logs, evals, schemas, fallbacks, and product code all live in the same pile. Give the weird part one stable interface so changes have a place to go.