How to beginner · 3 min read

How to design Pydantic schemas for extraction

Q: How to design Pydantic schemas for extraction

Use Pydantic models to define structured schemas representing the expected data fields for extraction. Pass these models as response_model to the instructor client when calling chat.completions.create to get typed, validated extraction results from AI responses.

Quick answer

Use Pydantic models to define structured schemas representing the expected data fields for extraction. Pass these models as response_model to the instructor client when calling chat.completions.create to get typed, validated extraction results from AI responses.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 instructor pydantic

Setup

Install the required packages and set your OpenAI API key as an environment variable.

bash

pip install openai instructor pydantic

Step by step

Define a Pydantic model representing the data you want to extract, then use instructor.from_openai to create a client wrapping the OpenAI SDK. Call chat.completions.create with your schema as response_model to get structured extraction.

python

import os
from pydantic import BaseModel
from openai import OpenAI
import instructor

# Define Pydantic schema for extraction
class User(BaseModel):
    name: str
    age: int

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Wrap with instructor for structured extraction
extractor = instructor.from_openai(client)

# Prompt with extraction request
prompt = "Extract: John is 30 years old"

# Call chat completion with response_model
response = extractor.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}],
    response_model=User
)

# Access typed extraction result
user = response
print(f"Name: {user.name}, Age: {user.age}")

output

Name: John, Age: 30

Common variations

Use async calls with await extractor.chat.completions.acreate(...) for asynchronous extraction.
Switch to Anthropic by using instructor.from_anthropic with an Anthropic client.
Define nested or optional fields in Pydantic models for complex extraction tasks.
Use different OpenAI models like gpt-4o or gpt-4o-mini depending on cost and accuracy needs.

Troubleshooting

If extraction fields are missing or incorrect, ensure your prompt clearly instructs the AI to provide the data in the expected format.
Validate your Pydantic schema matches the expected response structure exactly to avoid validation errors.
Check your API key and environment variables if you get authentication errors.
Use max_tokens parameter to allow enough tokens for the AI to complete the extraction.

✅

Key Takeaways

Define clear Pydantic models to represent the exact data you want extracted.
Use instructor with response_model to get typed, validated AI extraction results.
Adjust prompts and schema carefully to ensure accurate and complete extraction.
Async and Anthropic clients are supported for flexible integration.
Always set your API key securely via environment variables.

Verified 2026-04 · gpt-4o-mini, gpt-4o, claude-3-5-sonnet-20241022

Verify ↗