All posts
LLM Engineering
April 10, 20268 min read

Getting Reliable Structured Output from LLMs: JSON Mode, Pydantic, and the Patterns That Hold Up

M

Moneeb Abbas

AI Systems Architect

Unstructured text from an LLM is useful for conversations. For pipelines, classification tasks, data extraction, and agent tool results, you need structured output — a JSON object you can parse, validate, and hand off to the next step. The naive approach of asking the model to 'respond in JSON' breaks unpredictably. This post covers what actually works.

Why 'Respond in JSON' Is Not Enough

Asking an LLM to respond in JSON produces JSON most of the time — but not all of the time. The failure modes include: JSON wrapped in a markdown code block (```json ... ```), a sentence before the JSON ('Here is the data you requested:'), trailing commas in objects, truncated JSON when the output is long, and fields that do not match the schema you described in the prompt.

At low volume, you parse the output, catch the exception, and move on. At production volume — thousands of requests per day — a 2% parse failure rate becomes hundreds of dropped extractions. You need a systematic approach.

Level 1 — JSON Mode

Most major LLM providers offer a JSON mode or response format parameter that constrains the model to produce valid JSON. This eliminates markdown wrapping, prose preambles, and syntax errors:

python
# OpenAI JSON mode
response = client.chat.completions.create(
    model="gpt-4o",
    response_format={"type": "json_object"},
    messages=[
        {
            "role": "system",
            "content": "Extract the invoice fields and return JSON.",
        },
        {"role": "user", "content": invoice_text},
    ],
)
data = json.loads(response.choices[0].message.content)

JSON mode guarantees syntactically valid JSON. It does not guarantee that the JSON matches your schema — the model might return the right fields with wrong types, omit optional fields, or add extra fields you did not ask for. For simple schemas with 2–3 fields, JSON mode is usually sufficient. For complex schemas, you need structured outputs.

Level 2 — Structured Outputs (Schema-Constrained Generation)

OpenAI's structured outputs feature and similar constrained generation capabilities in other providers use the model's logit space to enforce that output tokens can only form a valid JSON document matching your schema. This is a stronger guarantee than JSON mode:

python
from pydantic import BaseModel
from typing import Optional
from openai import OpenAI

class InvoiceExtraction(BaseModel):
    vendor_name: str
    invoice_number: str
    amount_due: float
    due_date: Optional[str]  # ISO 8601 date or None if not found
    line_items: list[str]

client = OpenAI()
response = client.beta.chat.completions.parse(
    model="gpt-4o",
    response_format=InvoiceExtraction,  # pass the Pydantic model directly
    messages=[{"role": "user", "content": invoice_text}],
)

invoice = response.choices[0].message.parsed  # typed InvoiceExtraction object
print(invoice.amount_due)  # float, guaranteed

Level 3 — Instructor for Any Provider

The Instructor library wraps any LLM provider's API to add Pydantic-validated structured output with automatic retry on validation failure. It works with OpenAI, Anthropic, Gemini, Mistral, and local models via Ollama:

python
import instructor
from anthropic import Anthropic
from pydantic import BaseModel, field_validator

class ContractParties(BaseModel):
    client_name: str
    vendor_name: str
    effective_date: str  # YYYY-MM-DD format
    contract_value: Optional[float]

    @field_validator("effective_date")
    @classmethod
    def validate_date_format(cls, v: str) -> str:
        import re
        if not re.match(r"\d{4}-\d{2}-\d{2}", v):
            raise ValueError(f"Date must be YYYY-MM-DD, got: {v}")
        return v

# Instructor automatically retries if Pydantic validation fails
client = instructor.from_anthropic(Anthropic())

result = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    response_model=ContractParties,
    messages=[{"role": "user", "content": contract_text}],
    max_retries=3,  # retry up to 3 times on validation failure
)

print(result.effective_date)  # guaranteed YYYY-MM-DD

Schema Design Principles for LLMs

A well-designed schema reduces parse failures even before validation kicks in. The principles that matter most:

  • Use Optional for fields that may not be present: Never use a required field for information that might not be in the source document. An Optional field with None means 'not found'; a required field that the model cannot populate gets hallucinated.
  • Prefer strings over enums for uncertain values: If you are not sure the model will always produce a valid enum member, use a string and validate programmatically. A validation error is better than a silent wrong enum value.
  • Avoid deeply nested schemas on weaker models: Flat or shallowly nested schemas produce fewer errors than deeply nested ones. If your schema has 4+ levels of nesting, consider flattening it.
  • Provide field descriptions: Most structured output implementations allow description metadata on fields. 'The total invoice amount in USD as a decimal number, e.g. 1234.56' produces fewer type errors than 'amount'.
  • Break large extractions into multiple smaller schemas: Extracting 20 fields in one call produces more errors than two calls extracting 10 fields each.

Retry Strategy for Parse Failures

When structured output fails despite JSON mode or constrained generation, a retry with the validation error in context usually resolves it:

python
def extract_with_retry(text: str, schema: type[BaseModel], max_retries: int = 3) -> BaseModel:
    messages = [{"role": "user", "content": text}]

    for attempt in range(max_retries):
        raw = llm.complete(messages, response_format={"type": "json_object"})
        try:
            return schema.model_validate_json(raw)
        except ValidationError as e:
            if attempt == max_retries - 1:
                raise
            # Feed the error back to the model with the previous attempt
            messages += [
                {"role": "assistant", "content": raw},
                {
                    "role": "user",
                    "content": f"Your response had validation errors: {e}\n\nPlease fix and return valid JSON.",
                },
            ]
Note:In production, structured output with schema-constrained generation (OpenAI structured outputs or Instructor) has a parse failure rate below 0.1% on well-designed schemas. The naive JSON prompt approach runs 2–5% failure rates. At 100,000 daily extractions, the difference is hundreds vs thousands of failed records per day.

Working on something similar?

I take on 1–2 new projects per month. If you have a use case that needs this kind of engineering, tell me about it.

Get in touch