How to beginner · 3 min read

How to use Instructor for data extraction

Quick answer
Use the instructor Python library to wrap OpenAI's OpenAI client for structured data extraction by defining Pydantic models and passing them as response_model in client.chat.completions.create. This enables precise extraction of fields from unstructured text with minimal code.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0 instructor pydantic

Setup

Install the required packages and set your OpenAI API key in the environment.

  • Install packages: pip install openai instructor pydantic
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai instructor pydantic
output
Collecting openai
Collecting instructor
Collecting pydantic
Successfully installed openai instructor pydantic

Step by step

Define a Pydantic model for the data you want to extract, then use instructor.from_openai to create a client that wraps OpenAI's OpenAI client. Call chat.completions.create with your model as response_model and pass the user prompt in messages.

python
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel

# Define the data model for extraction
class User(BaseModel):
    name: str
    age: int

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Wrap with Instructor for structured extraction
instructor_client = instructor.from_openai(client)

# User prompt with data to extract
messages = [{"role": "user", "content": "Extract: John is 30 years old"}]

# Call chat completion with response_model
user = instructor_client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=User,
    messages=messages
)

print(f"Name: {user.name}, Age: {user.age}")
output
Name: John, Age: 30

Common variations

You can use different models like gpt-4o for higher accuracy or claude-3-5-sonnet-20241022 with Anthropic by wrapping their client via instructor.from_anthropic. Async usage is also supported by calling await on the create method in an async function.

python
import asyncio
import os
import anthropic
import instructor
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

async def main():
    # Anthropic client
    client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
    instructor_client = instructor.from_anthropic(client)

    messages = [{"role": "user", "content": "Extract: Alice is 25 years old"}]

    user = await instructor_client.chat.completions.create(
        model="claude-3-5-sonnet-20241022",
        response_model=User,
        messages=messages
    )

    print(f"Name: {user.name}, Age: {user.age}")

asyncio.run(main())
output
Name: Alice, Age: 25

Troubleshooting

  • If extraction fields are None or missing, ensure your Pydantic model matches the expected data types exactly.
  • If you get API errors, verify your OPENAI_API_KEY or ANTHROPIC_API_KEY environment variables are set correctly.
  • Use smaller prompts or increase max_tokens if the model truncates output.

Key Takeaways

  • Use instructor with Pydantic models to extract structured data from text easily.
  • Wrap the OpenAI or Anthropic client with instructor.from_openai or instructor.from_anthropic respectively.
  • Pass your Pydantic model as response_model in chat.completions.create for automatic parsing.
  • Async calls and different models are supported for flexibility and performance.
  • Always verify environment variables and model names to avoid runtime errors.
Verified 2026-04 · gpt-4o-mini, claude-3-5-sonnet-20241022
Verify ↗