How to beginner · 3 min read

How to extract entities with Instructor

Quick answer
Use the instructor Python library with an OpenAI OpenAI client to extract entities by defining a pydantic.BaseModel for the expected structure and passing it as response_model in client.chat.completions.create. This enables structured entity extraction from text prompts.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0 instructor pydantic

Setup

Install the required packages and set your OpenAI API key as an environment variable.

  • Install packages: pip install openai instructor pydantic
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai instructor pydantic

Step by step

Define a pydantic.BaseModel to specify the entity fields you want to extract. Use instructor.from_openai to wrap the OpenAI client, then call chat.completions.create with response_model to parse the output into structured entities.

python
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel

# Define the entity extraction schema
class Entities(BaseModel):
    person: str
    organization: str
    location: str

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Wrap OpenAI client with Instructor
inst_client = instructor.from_openai(client)

# Input text to extract entities from
text = "John Doe works at OpenAI in San Francisco."

# Call chat completion with response_model for structured extraction
response = inst_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": f"Extract person, organization, and location from this text: {text}"}],
    response_model=Entities
)

# Access extracted entities
entities = response
print(f"Person: {entities.person}")
print(f"Organization: {entities.organization}")
print(f"Location: {entities.location}")
output
Person: John Doe
Organization: OpenAI
Location: San Francisco

Common variations

You can use different models like gpt-4o for higher accuracy or gpt-4o-mini for cost efficiency. The instructor library also supports Anthropic clients via instructor.from_anthropic. For asynchronous usage, use await with async clients.

python
import asyncio
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel

class Entities(BaseModel):
    person: str
    organization: str
    location: str

async def async_entity_extraction():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    inst_client = instructor.from_openai(client)

    text = "Alice works at Anthropic in San Francisco."

    response = await inst_client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Extract person, organization, and location from this text: {text}"}],
        response_model=Entities
    )

    print(f"Person: {response.person}")
    print(f"Organization: {response.organization}")
    print(f"Location: {response.location}")

asyncio.run(async_entity_extraction())
output
Person: Alice
Organization: Anthropic
Location: San Francisco

Troubleshooting

  • If you get validation errors, ensure your pydantic.BaseModel matches the expected output format.
  • If the model returns unstructured text, try refining the prompt to explicitly request JSON or structured output.
  • Check your OPENAI_API_KEY environment variable is set correctly.

Key Takeaways

  • Use instructor with a pydantic.BaseModel to extract structured entities from text.
  • Pass response_model to chat.completions.create for automatic parsing.
  • Refine prompts to improve extraction accuracy and output format.
  • Supports both synchronous and asynchronous usage with OpenAI clients.
  • Always set your API key securely via environment variables.
Verified 2026-04 · gpt-4o-mini, gpt-4o
Verify ↗