How to extract lists with Instructor
Quick answer
Use the
instructor Python library with a Pydantic BaseModel defining a list field. Call client.chat.completions.create with response_model set to your model to extract lists from text in a structured way.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0 instructor pydantic
Setup
Install the required packages and set your OpenAI API key in the environment.
- Install with
pip install openai instructor pydantic - Set environment variable
OPENAI_API_KEYwith your API key.
pip install openai instructor pydantic Step by step
Define a Pydantic model with a list field, then use instructor to extract the list from text via OpenAI chat completion.
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Wrap OpenAI client with Instructor
instructor_client = instructor.from_openai(client)
# Define Pydantic model with a list field
class ShoppingList(BaseModel):
items: list[str]
# Input text containing a list
text = "Extract the shopping list: apples, bananas, oranges, and milk."
# Call chat completion with response_model to extract list
response = instructor_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": text}],
response_model=ShoppingList
)
# Access extracted list
print("Extracted items:", response.items) output
Extracted items: ['apples', 'bananas', 'oranges', 'milk']
Common variations
- Use async calls with
await instructor_client.chat.completions.acreate(...). - Change model to
gpt-4oorclaude-3-5-sonnet-20241022for higher accuracy. - Extract nested lists or complex structures by defining nested Pydantic models.
import asyncio
async def async_extract():
response = await instructor_client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": text}],
response_model=ShoppingList
)
print("Async extracted items:", response.items)
asyncio.run(async_extract()) output
Async extracted items: ['apples', 'bananas', 'oranges', 'milk']
Troubleshooting
- If the list extraction is incomplete or incorrect, try increasing
max_tokensor using a stronger model likegpt-4o. - Ensure your Pydantic model matches the expected output format exactly.
- If you get validation errors, check the input prompt clarity and model choice.
Key Takeaways
- Use
instructorwith Pydantic models to extract structured lists from text. - Set
response_modelinchat.completions.createfor automatic parsing. - Async extraction and stronger models improve accuracy and flexibility.