How to validate LLM output with Instructor
Quick answer
Use the
instructor Python package to wrap an LLM client like OpenAI and specify a response_model with pydantic. This enables automatic validation and structured extraction of LLM outputs by defining expected fields and types.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0 instructor pydantic
Setup
Install the required packages and set your OpenAI API key as an environment variable.
- Install packages:
pip install openai instructor pydantic - Set environment variable in your shell:
export OPENAI_API_KEY='your_api_key'
pip install openai instructor pydantic Step by step
Define a pydantic.BaseModel for the expected output schema, create an Instructor client from the OpenAI client, then call chat.completions.create with response_model to validate and parse the LLM output.
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel
# Define the expected structured output
class User(BaseModel):
name: str
age: int
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Wrap with Instructor for validation
inst_client = instructor.from_openai(client)
# Prompt to extract structured data
messages = [{"role": "user", "content": "Extract: John is 30 years old"}]
# Call with response_model to validate and parse
user = inst_client.chat.completions.create(
model="gpt-4o-mini",
response_model=User,
messages=messages
)
print(f"Name: {user.name}, Age: {user.age}") output
Name: John, Age: 30
Common variations
You can use instructor.from_anthropic() to validate outputs from Anthropic Claude models similarly. Also, you can define more complex pydantic models with nested fields or lists for richer validation. Async calls are supported by using await with async functions.
import asyncio
import anthropic
import instructor
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
async def main():
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
inst_client = instructor.from_anthropic(client)
messages = [{"role": "user", "content": "Extract: Alice is 25 years old"}]
user = await inst_client.chat.completions.create(
model="claude-3-5-sonnet-20241022",
response_model=User,
messages=messages
)
print(f"Name: {user.name}, Age: {user.age}")
asyncio.run(main()) output
Name: Alice, Age: 25
Troubleshooting
- If you get a
ValidationError, check that the LLM output matches thepydanticmodel fields and types exactly. - Ensure your environment variable
OPENAI_API_KEYorANTHROPIC_API_KEYis set correctly. - Use
print(user.json())to debug the parsed output structure.
Key Takeaways
- Use
instructorwithresponse_modelto enforce structured output validation from LLMs. - Define expected output schemas with
pydantic.BaseModelfor type-safe extraction. - Instructor supports both OpenAI and Anthropic clients with sync and async usage.
- Validation errors indicate mismatches between LLM output and your schema, helping catch errors early.