How to Intermediate · 3 min read

How to extract tables with Instructor

Q: How to extract tables with Instructor

Use the Instructor library with an OpenAI client to extract tables by defining a pydantic model representing the table structure and passing it as response_model in client.chat.completions.create. The model parses the AI response into structured table data automatically.

Quick answer

Use the Instructor library with an OpenAI client to extract tables by defining a pydantic model representing the table structure and passing it as response_model in client.chat.completions.create. The model parses the AI response into structured table data automatically.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 instructor pydantic

Setup

Install the required packages and set your OpenAI API key as an environment variable.

Install packages: pip install openai instructor pydantic
Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install openai instructor pydantic

Step by step

Define a pydantic model representing the table schema, then use instructor.from_openai to create a client that wraps the OpenAI client. Call chat.completions.create with response_model set to your table model to extract tables from text.

python

import os
from openai import OpenAI
import instructor
from pydantic import BaseModel
from typing import List

# Define a pydantic model for a table row
class TableRow(BaseModel):
    item: str
    quantity: int
    price: float

# Define a model for the entire table
class Table(BaseModel):
    rows: List[TableRow]

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Wrap OpenAI client with Instructor
inst_client = instructor.from_openai(client)

# Input text containing a table
text = """
Here is the sales data:

| Item       | Quantity | Price  |
|------------|----------|--------|
| Apples     | 10       | 0.5    |
| Bananas    | 5        | 0.3    |
| Cherries   | 20       | 1.5    |
"""

# Create chat completion with response_model to extract table
response = inst_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": f"Extract the table from the following text:\n{text}"}],
    response_model=Table
)

# Access structured table data
table = response
for row in table.rows:
    print(f"Item: {row.item}, Quantity: {row.quantity}, Price: {row.price}")

output

Item: Apples, Quantity: 10, Price: 0.5
Item: Bananas, Quantity: 5, Price: 0.3
Item: Cherries, Quantity: 20, Price: 1.5

Common variations

You can use asynchronous calls with await if your environment supports it. Also, you can switch to other OpenAI models like gpt-4o for higher accuracy or use Anthropic models by wrapping their client with instructor.from_anthropic. Streaming is not applicable for structured extraction with response_model.

python

import asyncio

async def async_extract():
    import os
    from openai import OpenAI
    import instructor
    from pydantic import BaseModel
    from typing import List

    class TableRow(BaseModel):
        item: str
        quantity: int
        price: float

    class Table(BaseModel):
        rows: List[TableRow]

    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    inst_client = instructor.from_openai(client)

    text = """
    | Item       | Quantity | Price  |
    |------------|----------|--------|
    | Oranges    | 15       | 0.7    |
    | Grapes     | 8        | 2.0    |
    """

    response = await inst_client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Extract the table from the following text:\n{text}"}],
        response_model=Table
    )

    for row in response.rows:
        print(f"Item: {row.item}, Quantity: {row.quantity}, Price: {row.price}")

# To run async example:
# asyncio.run(async_extract())

output

Item: Oranges, Quantity: 15, Price: 0.7
Item: Grapes, Quantity: 8, Price: 2.0

Troubleshooting

If the extracted data is incomplete or incorrect, ensure your pydantic model matches the table structure exactly.
If you get validation errors, check that the AI output format matches your model fields and types.
Use a more capable model like gpt-4o if extraction quality is poor.
Verify your OPENAI_API_KEY is set correctly and has access to the model.

✅

Key Takeaways

Define a precise pydantic model to represent your table schema for accurate extraction.
Use instructor.from_openai to wrap the OpenAI client and enable structured extraction with response_model.
Switch to more powerful models like gpt-4o for better table extraction accuracy.
Async extraction is supported with acreate for scalable applications.
Validate your environment variables and model compatibility to avoid extraction errors.

Verified 2026-04 · gpt-4o-mini, gpt-4o

Verify ↗