TIL: Quality Check RAG Chunks
/ 2 min read
RAG failures often look more sophisticated than they are.
The answer is bad, so the team debates embeddings, rerankers, model choice, prompt wording, hybrid search, and whether the whole architecture is cursed.
Then someone opens the index and finds the actual problem: duplicate chunks, empty chunks, navigation boilerplate, legal footers, ten-thousand-character blobs, and tiny fragments with no context.
The model was not the first thing broken. The corpus was messy.
Chunk quality checks are cheap and unglamorous, which is exactly why they get skipped. Run them before indexing and again after major corpus changes.
Checks worth running:
- empty or near-empty chunks
- duplicate and near-duplicate chunks
- chunks over the target size
- chunks too small to mean anything
- chunks dominated by boilerplate
- documents with zero chunks
- documents producing suspiciously many chunks
- chunks that get retrieved constantly
- chunks that never get retrieved
None of this guarantees good RAG. It clears the obvious debris before you spend a week tuning the fancy parts.
A good chunk should usually mean one thing clearly enough that it can be retrieved, cited, and judged. If it contains three unrelated sections, retrieval gets fuzzy. If it contains only a heading, generation gets desperate. If it appears fifty times, your top-k can fill with repeats while the useful context sits just below the fold.
Pair chunk audits with retriever evals. Retrieval metrics tell you whether the right chunks appear; chunk audits tell you whether the candidates are worth retrieving in the first place.
Before you tune the retriever, inspect what you handed it. Bad chunks make good retrieval look broken.
Part of the Effective AI Engineering series.
Source: adapted from Mirascope’s “Quality Control Your RAG Chunks”, MIT licensed.