How to beginner · 3 min read

How to extract structured data with Instructor

Q: How to extract structured data with Instructor

Use the instructor Python library to define a pydantic.BaseModel representing your structured data schema, then call client.chat.completions.create with response_model=YourModel to extract typed data from text. This approach leverages OpenAI's gpt-4o-mini or similar models for precise structured extraction.

Quick answer

Use the instructor Python library to define a pydantic.BaseModel representing your structured data schema, then call client.chat.completions.create with response_model=YourModel to extract typed data from text. This approach leverages OpenAI's gpt-4o-mini or similar models for precise structured extraction.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 instructor pydantic

Setup

Install the required packages and set your OpenAI API key as an environment variable.

Install packages: pip install openai instructor pydantic
Set environment variable in your shell: export OPENAI_API_KEY='your_api_key'

bash

pip install openai instructor pydantic

Step by step

Define a pydantic.BaseModel for the structured data you want to extract, then use instructor.from_openai to create a client wrapping OpenAI. Call client.chat.completions.create with your model and input text to get typed structured output.

python

import os
from pydantic import BaseModel
import instructor
from openai import OpenAI

# Define your structured data model
class User(BaseModel):
    name: str
    age: int

# Initialize OpenAI client
openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Wrap OpenAI client with Instructor
client = instructor.from_openai(openai_client)

# Input text to extract from
input_text = "Extract: John is 30 years old"

# Call chat completion with response_model
response = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=User,
    messages=[{"role": "user", "content": input_text}]
)

# Access structured data
user = response
print(f"Name: {user.name}, Age: {user.age}")

output

Name: John, Age: 30

Common variations

You can use different OpenAI models like gpt-4o or gpt-4o-mini depending on your accuracy and cost needs. Instructor also supports Anthropic models via instructor.from_anthropic. For asynchronous usage, use await client.chat.completions.acreate(...) in an async function.

python

import asyncio

async def async_extract():
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        response_model=User,
        messages=[{"role": "user", "content": "Extract: Alice is 25 years old"}]
    )
    print(f"Name: {response.name}, Age: {response.age}")

asyncio.run(async_extract())

output

Name: Alice, Age: 25

Troubleshooting

If you get validation errors, ensure your pydantic model matches the expected data format.
If the API returns unexpected results, try adding more context or examples in the prompt.
Make sure your OPENAI_API_KEY environment variable is set correctly.

✅

Key Takeaways

Define your data schema with pydantic.BaseModel for typed extraction.
Use instructor.from_openai to wrap OpenAI client for structured responses.
Pass response_model=YourModel to chat.completions.create for automatic parsing.
Supports async calls and multiple models for flexibility.
Validate your model and prompt to improve extraction accuracy.

Verified 2026-04 · gpt-4o-mini, gpt-4o

Verify ↗