How to intermediate · 3 min read

How to use Pydantic with LLM structured outputs

Quick answer
Use Pydantic models to define the expected structured output schema from an LLM, then parse the LLM's JSON response string with Pydantic.parse_raw() or Pydantic.parse_obj() for type-safe validation. This ensures your AI integration handles structured data reliably and cleanly.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0 pydantic>=2.0

Setup

Install the required packages and set your OpenAI API key as an environment variable.

bash
pip install openai pydantic

Step by step

Define a Pydantic model representing the expected structured output, call the LLM to generate a JSON string, then parse and validate it with Pydantic.

python
import os
from openai import OpenAI
from pydantic import BaseModel, ValidationError

# Define Pydantic model for structured output
class Person(BaseModel):
    name: str
    age: int
    email: str

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

prompt = (
    "Generate a JSON object with keys: name (string), age (int), email (string)."
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

# Extract the content (expected to be JSON string)
json_str = response.choices[0].message.content

try:
    # Parse and validate JSON string with Pydantic
    person = Person.parse_raw(json_str)
    print("Parsed structured output:", person)
except ValidationError as e:
    print("Validation error:", e)
output
Parsed structured output: name='Alice' age=30 email='alice@example.com'

Common variations

  • Use parse_obj() if you convert the LLM output to a Python dict first (e.g., with json.loads()).
  • Use async calls with OpenAI SDK's async client methods.
  • Adapt the Pydantic model for nested or more complex structured outputs.
  • Use other LLMs like claude-3-5-sonnet-20241022 with similar parsing logic.
python
import json

# Example using parse_obj with dict
json_dict = json.loads(json_str)
person = Person.parse_obj(json_dict)
print(person)
output
name='Alice' age=30 email='alice@example.com'

Troubleshooting

  • If you get a ValidationError, check the LLM output format matches your Pydantic model exactly.
  • Use prompt engineering to instruct the LLM to output strict JSON without extra text.
  • Wrap parsing in try-except to handle malformed or unexpected outputs gracefully.

Key Takeaways

  • Define clear Pydantic models to enforce structured output schemas from LLMs.
  • Parse LLM JSON responses with parse_raw() or parse_obj() for type safety.
  • Use prompt instructions to get clean JSON output from the LLM.
  • Handle validation errors to make your integration robust.
  • Adapt models for nested or complex data structures as needed.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022
Verify ↗