How to beginner · 3 min read

How to batch extract with Instructor

Q: How to batch extract with Instructor

Use the instructor library with an OpenAI client to batch extract structured data by passing multiple inputs in a loop or list comprehension to client.chat.completions.create with a response_model. This enables efficient extraction of multiple texts in one script using Pydantic models.

Quick answer

Use the instructor library with an OpenAI client to batch extract structured data by passing multiple inputs in a loop or list comprehension to client.chat.completions.create with a response_model. This enables efficient extraction of multiple texts in one script using Pydantic models.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 instructor pydantic

Setup

Install the required packages and set your OpenAI API key as an environment variable.

Install packages: pip install openai instructor pydantic
Set environment variable in your shell: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install openai instructor pydantic

Step by step

Define a Pydantic model for the structured data you want to extract, then use instructor.from_openai to create a client wrapping the OpenAI SDK. Loop over your batch of texts and call client.chat.completions.create with response_model to extract data for each input.

python

import os
from openai import OpenAI
import instructor
from pydantic import BaseModel

# Define the structured data model
class User(BaseModel):
    name: str
    age: int

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Wrap with Instructor client
inst_client = instructor.from_openai(client)

# Batch of texts to extract from
texts = [
    "Extract: John is 30 years old",
    "Extract: Alice is 25 years old",
    "Extract: Bob is 40 years old"
]

# Extract data in batch
results = []
for text in texts:
    response = inst_client.chat.completions.create(
        model="gpt-4o-mini",
        response_model=User,
        messages=[{"role": "user", "content": text}]
    )
    results.append(response)

# Print extracted data
for res in results:
    print(f"Name: {res.name}, Age: {res.age}")

output

Name: John, Age: 30
Name: Alice, Age: 25
Name: Bob, Age: 40

Common variations

You can perform batch extraction asynchronously by using await with inst_client.chat.completions.acreate inside an async function. Also, you can switch to different OpenAI models like gpt-4o for higher accuracy or use other Pydantic models for different extraction schemas.

python

import asyncio

async def batch_extract_async(texts):
    results = []
    for text in texts:
        response = await inst_client.chat.completions.acreate(
            model="gpt-4o",
            response_model=User,
            messages=[{"role": "user", "content": text}]
        )
        results.append(response)
    return results

texts = [
    "Extract: John is 30 years old",
    "Extract: Alice is 25 years old",
    "Extract: Bob is 40 years old"
]

results = asyncio.run(batch_extract_async(texts))
for res in results:
    print(f"Name: {res.name}, Age: {res.age}")

output

Name: John, Age: 30
Name: Alice, Age: 25
Name: Bob, Age: 40

Troubleshooting

If extraction results are missing fields or incorrect, verify your Pydantic model matches the expected output format.
If you get authentication errors, ensure OPENAI_API_KEY is set correctly in your environment.
For rate limits, batch your requests with delays or use a higher quota plan.

✅

Key Takeaways

Use instructor.from_openai with a Pydantic model to batch extract structured data efficiently.
Loop over your input texts and call client.chat.completions.create with response_model for each extraction.
Async extraction is supported via acreate for better throughput.
Ensure your Pydantic model matches the expected extraction schema to avoid parsing errors.
Always set your API key securely via environment variables.

Verified 2026-04 · gpt-4o-mini, gpt-4o

Verify ↗