Fix Instructor extraction wrong fields
Quick answer
To fix wrong field extraction in instructor, define a precise Pydantic BaseModel matching the expected response fields and pass it as response_model= in client.chat.completions.create. Ensure your messages prompt the model to output data conforming exactly to your model's schema.
PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0pip install instructor pydantic
Setup
Install the required packages and set your OpenAI API key as an environment variable.
pip install openai instructor pydantic Step by step
Define a Pydantic model that exactly matches the fields you want to extract. Use instructor.from_openai to wrap the OpenAI client and pass the model as response_model in the chat completion call. Provide a clear extraction prompt.
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel
# Define the Pydantic model matching expected extraction fields
class User(BaseModel):
name: str
age: int
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Wrap with instructor client
inst_client = instructor.from_openai(client)
# Prepare messages
messages = [{"role": "user", "content": "Extract: John is 30 years old"}]
# Call chat completion with response_model
user = inst_client.chat.completions.create(
model="gpt-4o-mini",
response_model=User,
messages=messages
)
print(user.name, user.age) output
John 30
Common variations
You can use other models like gpt-4o or Anthropic models by wrapping their clients with instructor.from_anthropic. For asynchronous usage, use await with async functions. Always ensure your Pydantic model fields exactly match the expected output keys to avoid extraction errors.
import asyncio
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
async def main():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
inst_client = instructor.from_openai(client)
messages = [{"role": "user", "content": "Extract: Alice is 25 years old"}]
user = await inst_client.chat.completions.acreate(
model="gpt-4o-mini",
response_model=User,
messages=messages
)
print(user.name, user.age)
asyncio.run(main()) output
Alice 25
Troubleshooting
- If fields are missing or extraction fails, verify your Pydantic model field names exactly match the keys in the model's JSON output.
- Ensure your prompt clearly instructs the model to output structured data matching your model.
- Check that you use
response_model=parameter and notresponse_format=. - Update
instructorandopenaipackages to the latest versions to avoid compatibility issues.
Key Takeaways
- Define Pydantic models that exactly match the expected extraction fields to fix wrong field issues.
- Use
response_model=parameter ininstructorchat completions for structured extraction. - Ensure prompts clearly instruct the model to output data conforming to your Pydantic schema.
- Keep
instructorandopenaiSDKs updated for best compatibility. - Use async calls with
acreatefor asynchronous extraction when needed.