How to beginner · 3 min read

How to use Pydantic BaseModel for LLM output

Q: How to use Pydantic BaseModel for LLM output

Use Pydantic BaseModel to define a schema for the expected LLM output, then parse the LLM's JSON response into this model for type-safe access and validation. This approach ensures structured, reliable data handling from raw LLM text output.

Quick answer

Use Pydantic BaseModel to define a schema for the expected LLM output, then parse the LLM's JSON response into this model for type-safe access and validation. This approach ensures structured, reliable data handling from raw LLM text output.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 pydantic

Setup

Install the required packages and set your OpenAI API key as an environment variable.

Install packages: pip install openai pydantic
Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install openai pydantic

Step by step

Define a Pydantic BaseModel to represent the expected structured output from the LLM. Then call the LLM to generate JSON output, parse it with BaseModel.parse_raw(), and access the typed fields safely.

python

import os
import json
from openai import OpenAI
from pydantic import BaseModel, ValidationError

# Define the Pydantic model for expected output
class Person(BaseModel):
    name: str
    age: int
    email: str

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Prompt the LLM to output JSON matching the Person model
prompt = (
    "Generate a JSON object with fields: name (string), age (integer), and email (string)."
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}]
)

llm_output = response.choices[0].message.content
print("Raw LLM output:", llm_output)

try:
    # Parse and validate the JSON output using Pydantic
    person = Person.parse_raw(llm_output)
    print("Parsed output:", person)
    print(f"Name: {person.name}, Age: {person.age}, Email: {person.email}")
except ValidationError as e:
    print("Validation error:", e)
except json.JSONDecodeError as e:
    print("JSON decode error:", e)

output

Raw LLM output: {"name": "Alice Smith", "age": 30, "email": "alice@example.com"}
Parsed output: name='Alice Smith' age=30 email='alice@example.com'
Name: Alice Smith, Age: 30, Email: alice@example.com

Common variations

You can adapt this pattern for different models, async calls, or streaming output:

Use other LLMs like claude-3-5-sonnet-20241022 with Anthropic SDK.
Use BaseModel.parse_obj() if you convert LLM output string to dict first.
For async, use asyncio with OpenAI's async client methods.
Validate nested or complex schemas with nested BaseModel classes.

python

import asyncio
from openai import OpenAI
from pydantic import BaseModel

class Address(BaseModel):
    street: str
    city: str

class User(BaseModel):
    name: str
    age: int
    address: Address

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    prompt = "Generate JSON with name, age, and nested address (street, city)."
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    output = response.choices[0].message.content
    user = User.parse_raw(output)
    print(user)

asyncio.run(main())

output

User(name='Bob', age=25, address=Address(street='123 Main St', city='Springfield'))

Troubleshooting

If you get ValidationError, check that the LLM output matches your BaseModel schema exactly.
If JSON parsing fails, ensure the LLM outputs valid JSON (consider instructing it explicitly to output JSON only).
Use try-except blocks to catch and handle parsing errors gracefully.
For inconsistent output, consider post-processing or using a stricter prompt to enforce JSON format.

✅

Key Takeaways

Define a Pydantic BaseModel schema matching your expected LLM JSON output for type-safe parsing.
Use BaseModel.parse_raw() to convert raw LLM JSON strings into validated Python objects.
Always handle JSON decode and validation errors to robustly process LLM outputs.
Adapt the pattern for nested schemas, async calls, or different LLM providers as needed.

Verified 2026-04 · gpt-4o-mini, claude-3-5-sonnet-20241022

Verify ↗