Why 'Respond in JSON' Is Not Enough
Asking an LLM to respond in JSON produces JSON most of the time — but not all of the time. The failure modes include: JSON wrapped in a markdown code block (```json ... ```), a sentence before the JSON ('Here is the data you requested:'), trailing commas in objects, truncated JSON when the output is long, and fields that do not match the schema you described in the prompt.
At low volume, you parse the output, catch the exception, and move on. At production volume — thousands of requests per day — a 2% parse failure rate becomes hundreds of dropped extractions. You need a systematic approach.
Level 1 — JSON Mode
Most major LLM providers offer a JSON mode or response format parameter that constrains the model to produce valid JSON. This eliminates markdown wrapping, prose preambles, and syntax errors:
# OpenAI JSON mode
response = client.chat.completions.create(
model="gpt-4o",
response_format={"type": "json_object"},
messages=[
{
"role": "system",
"content": "Extract the invoice fields and return JSON.",
},
{"role": "user", "content": invoice_text},
],
)
data = json.loads(response.choices[0].message.content)JSON mode guarantees syntactically valid JSON. It does not guarantee that the JSON matches your schema — the model might return the right fields with wrong types, omit optional fields, or add extra fields you did not ask for. For simple schemas with 2–3 fields, JSON mode is usually sufficient. For complex schemas, you need structured outputs.
Level 2 — Structured Outputs (Schema-Constrained Generation)
OpenAI's structured outputs feature and similar constrained generation capabilities in other providers use the model's logit space to enforce that output tokens can only form a valid JSON document matching your schema. This is a stronger guarantee than JSON mode:
from pydantic import BaseModel
from typing import Optional
from openai import OpenAI
class InvoiceExtraction(BaseModel):
vendor_name: str
invoice_number: str
amount_due: float
due_date: Optional[str] # ISO 8601 date or None if not found
line_items: list[str]
client = OpenAI()
response = client.beta.chat.completions.parse(
model="gpt-4o",
response_format=InvoiceExtraction, # pass the Pydantic model directly
messages=[{"role": "user", "content": invoice_text}],
)
invoice = response.choices[0].message.parsed # typed InvoiceExtraction object
print(invoice.amount_due) # float, guaranteedLevel 3 — Instructor for Any Provider
The Instructor library wraps any LLM provider's API to add Pydantic-validated structured output with automatic retry on validation failure. It works with OpenAI, Anthropic, Gemini, Mistral, and local models via Ollama:
import instructor
from anthropic import Anthropic
from pydantic import BaseModel, field_validator
class ContractParties(BaseModel):
client_name: str
vendor_name: str
effective_date: str # YYYY-MM-DD format
contract_value: Optional[float]
@field_validator("effective_date")
@classmethod
def validate_date_format(cls, v: str) -> str:
import re
if not re.match(r"\d{4}-\d{2}-\d{2}", v):
raise ValueError(f"Date must be YYYY-MM-DD, got: {v}")
return v
# Instructor automatically retries if Pydantic validation fails
client = instructor.from_anthropic(Anthropic())
result = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
response_model=ContractParties,
messages=[{"role": "user", "content": contract_text}],
max_retries=3, # retry up to 3 times on validation failure
)
print(result.effective_date) # guaranteed YYYY-MM-DDSchema Design Principles for LLMs
A well-designed schema reduces parse failures even before validation kicks in. The principles that matter most:
- Use Optional for fields that may not be present: Never use a required field for information that might not be in the source document. An Optional field with None means 'not found'; a required field that the model cannot populate gets hallucinated.
- Prefer strings over enums for uncertain values: If you are not sure the model will always produce a valid enum member, use a string and validate programmatically. A validation error is better than a silent wrong enum value.
- Avoid deeply nested schemas on weaker models: Flat or shallowly nested schemas produce fewer errors than deeply nested ones. If your schema has 4+ levels of nesting, consider flattening it.
- Provide field descriptions: Most structured output implementations allow description metadata on fields. 'The total invoice amount in USD as a decimal number, e.g. 1234.56' produces fewer type errors than 'amount'.
- Break large extractions into multiple smaller schemas: Extracting 20 fields in one call produces more errors than two calls extracting 10 fields each.
Retry Strategy for Parse Failures
When structured output fails despite JSON mode or constrained generation, a retry with the validation error in context usually resolves it:
def extract_with_retry(text: str, schema: type[BaseModel], max_retries: int = 3) -> BaseModel:
messages = [{"role": "user", "content": text}]
for attempt in range(max_retries):
raw = llm.complete(messages, response_format={"type": "json_object"})
try:
return schema.model_validate_json(raw)
except ValidationError as e:
if attempt == max_retries - 1:
raise
# Feed the error back to the model with the previous attempt
messages += [
{"role": "assistant", "content": raw},
{
"role": "user",
"content": f"Your response had validation errors: {e}\n\nPlease fix and return valid JSON.",
},
]