skip to content
Skylar Payne

The demo asks the model for a nice answer.

Production needs fields.

A meeting summary has action items. A support answer has a confidence level and citations. A document classifier has a category, rationale, and escalation flag. The rest of the application does not want a charming paragraph. It wants something it can validate, store, route, and test.

The failure mode is familiar: the model returns text, so the application starts scraping text.

summary = output.split("Action Items:")[0]
items = parse_bullets(output.split("Action Items:")[1])

It works until the model says “Next steps” instead of “Action Items.” Someone adds another branch. Then another. Eventually the parser has become a second, worse prompt.

Put the structure at the model boundary:

class ActionItem(BaseModel):
task: str
owner: str | None
due_date: date | None
class MeetingSummary(BaseModel):
summary: str
decisions: list[str]
action_items: list[ActionItem]

Ask for the shape you need. Validate it before the result enters product code. If validation fails, retry, repair, escalate, or return a typed failure. Keep the mess close to the AI call instead of spreading parsing guesses through the app.

Typed outputs are a reliability pattern. They let you measure schema failure rates, build regression cases, and decide which failures deserve a better prompt, better examples, or a stricter product path.

Later changes get less scary too. Adding citations or requires_human_review becomes a schema change with an obvious owner, not a hunt through every place that split a string.

When software depends on an LLM result, treat it like an API response. Put the structure at the boundary and make free text opt-in, not the default.

Related: AI reliability and AI platforms.

Part of the Effective AI Engineering series.

Source: adapted from Mirascope’s “Structure Your Outputs for Reliable Systems”, MIT licensed.