TIL: Structure LLM Outputs at the Boundary
/ 2 min read
The demo asks the model for a nice answer.
Production needs fields.
A meeting summary has action items. A support answer has a confidence level and citations. A document classifier has a category, rationale, and escalation flag. The rest of the application does not want a charming paragraph. It wants something it can validate, store, route, and test.
The failure mode is familiar: the model returns text, so the application starts scraping text.
summary = output.split("Action Items:")[0]items = parse_bullets(output.split("Action Items:")[1])It works until the model says “Next steps” instead of “Action Items.” Someone adds another branch. Then another. Eventually the parser has become a second, worse prompt.
Put the structure at the model boundary:
class ActionItem(BaseModel): task: str owner: str | None due_date: date | None
class MeetingSummary(BaseModel): summary: str decisions: list[str] action_items: list[ActionItem]Ask for the shape you need. Validate it before the result enters product code. If validation fails, retry, repair, escalate, or return a typed failure. Keep the mess close to the AI call instead of spreading parsing guesses through the app.
Typed outputs are a reliability pattern. They let you measure schema failure rates, build regression cases, and decide which failures deserve a better prompt, better examples, or a stricter product path.
Later changes get less scary too. Adding citations or requires_human_review becomes a schema change with an obvious owner, not a hunt through every place that split a string.
When software depends on an LLM result, treat it like an API response. Put the structure at the boundary and make free text opt-in, not the default.
Related: AI reliability and AI platforms.
Part of the Effective AI Engineering series.
Source: adapted from Mirascope’s “Structure Your Outputs for Reliable Systems”, MIT licensed.