TIL: Quality Check RAG Chunks • Skylar Payne

RAG failures often look more sophisticated than they are.

The answer is bad, so the team debates embeddings, rerankers, model choice, prompt wording, hybrid search, and whether the whole architecture is cursed.

Then someone opens the index and finds the actual problem: duplicate chunks, empty chunks, navigation boilerplate, legal footers, ten-thousand-character blobs, and tiny fragments with no context.

The model was not the first thing broken. The corpus was messy.

Chunk quality checks are cheap and unglamorous, which is exactly why they get skipped. Run them before indexing and again after major corpus changes.

Checks worth running:

empty or near-empty chunks
duplicate and near-duplicate chunks
chunks over the target size
chunks too small to mean anything
chunks dominated by boilerplate
documents with zero chunks
documents producing suspiciously many chunks
chunks that get retrieved constantly
chunks that never get retrieved

None of this guarantees good RAG. It clears the obvious debris before you spend a week tuning the fancy parts.

A good chunk should usually mean one thing clearly enough that it can be retrieved, cited, and judged. If it contains three unrelated sections, retrieval gets fuzzy. If it contains only a heading, generation gets desperate. If it appears fifty times, your top-k can fill with repeats while the useful context sits just below the fold.

Pair chunk audits with retriever evals. Retrieval metrics tell you whether the right chunks appear; chunk audits tell you whether the candidates are worth retrieving in the first place.

Before you tune the retriever, inspect what you handed it. Bad chunks make good retrieval look broken.

Related: RAG and AI evals.

Part of the Effective AI Engineering series.

Source: adapted from Mirascope’s “Quality Control Your RAG Chunks”, MIT licensed.