How to beginner · 3 min read

How to validate LLM output with Instructor

Quick answer
Use the instructor Python package to wrap an LLM client like OpenAI and specify a response_model with pydantic. This enables automatic validation and structured extraction of LLM outputs by defining expected fields and types.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0 instructor pydantic

Setup

Install the required packages and set your OpenAI API key as an environment variable.

  • Install packages: pip install openai instructor pydantic
  • Set environment variable in your shell: export OPENAI_API_KEY='your_api_key'
bash
pip install openai instructor pydantic

Step by step

Define a pydantic.BaseModel for the expected output schema, create an Instructor client from the OpenAI client, then call chat.completions.create with response_model to validate and parse the LLM output.

python
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel

# Define the expected structured output
class User(BaseModel):
    name: str
    age: int

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Wrap with Instructor for validation
inst_client = instructor.from_openai(client)

# Prompt to extract structured data
messages = [{"role": "user", "content": "Extract: John is 30 years old"}]

# Call with response_model to validate and parse
user = inst_client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=User,
    messages=messages
)

print(f"Name: {user.name}, Age: {user.age}")
output
Name: John, Age: 30

Common variations

You can use instructor.from_anthropic() to validate outputs from Anthropic Claude models similarly. Also, you can define more complex pydantic models with nested fields or lists for richer validation. Async calls are supported by using await with async functions.

python
import asyncio
import anthropic
import instructor
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

async def main():
    client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
    inst_client = instructor.from_anthropic(client)

    messages = [{"role": "user", "content": "Extract: Alice is 25 years old"}]

    user = await inst_client.chat.completions.create(
        model="claude-3-5-sonnet-20241022",
        response_model=User,
        messages=messages
    )

    print(f"Name: {user.name}, Age: {user.age}")

asyncio.run(main())
output
Name: Alice, Age: 25

Troubleshooting

  • If you get a ValidationError, check that the LLM output matches the pydantic model fields and types exactly.
  • Ensure your environment variable OPENAI_API_KEY or ANTHROPIC_API_KEY is set correctly.
  • Use print(user.json()) to debug the parsed output structure.

Key Takeaways

  • Use instructor with response_model to enforce structured output validation from LLMs.
  • Define expected output schemas with pydantic.BaseModel for type-safe extraction.
  • Instructor supports both OpenAI and Anthropic clients with sync and async usage.
  • Validation errors indicate mismatches between LLM output and your schema, helping catch errors early.
Verified 2026-04 · gpt-4o-mini, claude-3-5-sonnet-20241022
Verify ↗