How to beginner · 3 min read

How to extract lists with Instructor

Quick answer
Use the instructor Python library with a Pydantic BaseModel defining a list field. Call client.chat.completions.create with response_model set to your model to extract lists from text in a structured way.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0 instructor pydantic

Setup

Install the required packages and set your OpenAI API key in the environment.

  • Install with pip install openai instructor pydantic
  • Set environment variable OPENAI_API_KEY with your API key.
bash
pip install openai instructor pydantic

Step by step

Define a Pydantic model with a list field, then use instructor to extract the list from text via OpenAI chat completion.

python
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Wrap OpenAI client with Instructor
instructor_client = instructor.from_openai(client)

# Define Pydantic model with a list field
class ShoppingList(BaseModel):
    items: list[str]

# Input text containing a list
text = "Extract the shopping list: apples, bananas, oranges, and milk."

# Call chat completion with response_model to extract list
response = instructor_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": text}],
    response_model=ShoppingList
)

# Access extracted list
print("Extracted items:", response.items)
output
Extracted items: ['apples', 'bananas', 'oranges', 'milk']

Common variations

  • Use async calls with await instructor_client.chat.completions.acreate(...).
  • Change model to gpt-4o or claude-3-5-sonnet-20241022 for higher accuracy.
  • Extract nested lists or complex structures by defining nested Pydantic models.
python
import asyncio

async def async_extract():
    response = await instructor_client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": text}],
        response_model=ShoppingList
    )
    print("Async extracted items:", response.items)

asyncio.run(async_extract())
output
Async extracted items: ['apples', 'bananas', 'oranges', 'milk']

Troubleshooting

  • If the list extraction is incomplete or incorrect, try increasing max_tokens or using a stronger model like gpt-4o.
  • Ensure your Pydantic model matches the expected output format exactly.
  • If you get validation errors, check the input prompt clarity and model choice.

Key Takeaways

  • Use instructor with Pydantic models to extract structured lists from text.
  • Set response_model in chat.completions.create for automatic parsing.
  • Async extraction and stronger models improve accuracy and flexibility.
Verified 2026-04 · gpt-4o-mini, gpt-4o, claude-3-5-sonnet-20241022
Verify ↗