How to Intermediate · 3 min read

How to extract tables with Instructor

Quick answer
Use the Instructor library with an OpenAI client to extract tables by defining a pydantic model representing the table structure and passing it as response_model in client.chat.completions.create. The model parses the AI response into structured table data automatically.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0 instructor pydantic

Setup

Install the required packages and set your OpenAI API key as an environment variable.

  • Install packages: pip install openai instructor pydantic
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai instructor pydantic

Step by step

Define a pydantic model representing the table schema, then use instructor.from_openai to create a client that wraps the OpenAI client. Call chat.completions.create with response_model set to your table model to extract tables from text.

python
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel
from typing import List

# Define a pydantic model for a table row
class TableRow(BaseModel):
    item: str
    quantity: int
    price: float

# Define a model for the entire table
class Table(BaseModel):
    rows: List[TableRow]

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Wrap OpenAI client with Instructor
inst_client = instructor.from_openai(client)

# Input text containing a table
text = """
Here is the sales data:

| Item       | Quantity | Price  |
|------------|----------|--------|
| Apples     | 10       | 0.5    |
| Bananas    | 5        | 0.3    |
| Cherries   | 20       | 1.5    |
"""

# Create chat completion with response_model to extract table
response = inst_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": f"Extract the table from the following text:\n{text}"}],
    response_model=Table
)

# Access structured table data
table = response
for row in table.rows:
    print(f"Item: {row.item}, Quantity: {row.quantity}, Price: {row.price}")
output
Item: Apples, Quantity: 10, Price: 0.5
Item: Bananas, Quantity: 5, Price: 0.3
Item: Cherries, Quantity: 20, Price: 1.5

Common variations

You can use asynchronous calls with await if your environment supports it. Also, you can switch to other OpenAI models like gpt-4o for higher accuracy or use Anthropic models by wrapping their client with instructor.from_anthropic. Streaming is not applicable for structured extraction with response_model.

python
import asyncio

async def async_extract():
    import os
    from openai import OpenAI
    import instructor
    from pydantic import BaseModel
    from typing import List

    class TableRow(BaseModel):
        item: str
        quantity: int
        price: float

    class Table(BaseModel):
        rows: List[TableRow]

    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    inst_client = instructor.from_openai(client)

    text = """
    | Item       | Quantity | Price  |
    |------------|----------|--------|
    | Oranges    | 15       | 0.7    |
    | Grapes     | 8        | 2.0    |
    """

    response = await inst_client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Extract the table from the following text:\n{text}"}],
        response_model=Table
    )

    for row in response.rows:
        print(f"Item: {row.item}, Quantity: {row.quantity}, Price: {row.price}")

# To run async example:
# asyncio.run(async_extract())
output
Item: Oranges, Quantity: 15, Price: 0.7
Item: Grapes, Quantity: 8, Price: 2.0

Troubleshooting

  • If the extracted data is incomplete or incorrect, ensure your pydantic model matches the table structure exactly.
  • If you get validation errors, check that the AI output format matches your model fields and types.
  • Use a more capable model like gpt-4o if extraction quality is poor.
  • Verify your OPENAI_API_KEY is set correctly and has access to the model.

Key Takeaways

  • Define a precise pydantic model to represent your table schema for accurate extraction.
  • Use instructor.from_openai to wrap the OpenAI client and enable structured extraction with response_model.
  • Switch to more powerful models like gpt-4o for better table extraction accuracy.
  • Async extraction is supported with acreate for scalable applications.
  • Validate your environment variables and model compatibility to avoid extraction errors.
Verified 2026-04 · gpt-4o-mini, gpt-4o
Verify ↗