How to Intermediate · 3 min read

How to stream structured outputs with Instructor

Quick answer
Use Instructor with the stream=True parameter in the chat.completions.create method to receive structured outputs incrementally. Define a pydantic BaseModel as the response_model to parse streamed JSON data in real time.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0 instructor pydantic

Setup

Install the required packages and set your OpenAI API key as an environment variable.

  • Install packages: pip install openai instructor pydantic
  • Set environment variable in your shell: export OPENAI_API_KEY='your_api_key'
bash
pip install openai instructor pydantic

Step by step

Define a pydantic.BaseModel for the structured output, then create an Instructor client from the OpenAI client. Use stream=True to receive partial structured responses as they arrive.

python
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel

# Define the structured output model
class UserInfo(BaseModel):
    name: str
    age: int

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Create Instructor client wrapping OpenAI
inst = instructor.from_openai(client)

# Prepare messages
messages = [{"role": "user", "content": "Extract user info: John is 30 years old."}]

# Stream structured output
stream = inst.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    response_model=UserInfo,
    stream=True
)

print("Streaming structured output:")
for partial in stream:
    # partial is a UserInfo instance with partial data
    print(partial)

# Note: The final streamed object will have full parsed fields.
output
Streaming structured output:
UserInfo(name='J', age=None)
UserInfo(name='John', age=None)
UserInfo(name='John', age=3)
UserInfo(name='John', age=30)

Common variations

You can use asynchronous streaming with async for loops, switch to different models like gpt-4o, or stream multiple structured objects by defining a list model. Instructor supports both OpenAI and Anthropic clients.

python
import asyncio

async def async_stream():
    async for partial in inst.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=messages,
        response_model=UserInfo,
        stream=True
    ):
        print(partial)

asyncio.run(async_stream())
output
UserInfo(name='J', age=None)
UserInfo(name='John', age=None)
UserInfo(name='John', age=3)
UserInfo(name='John', age=30)

Troubleshooting

  • If streaming yields incomplete or invalid JSON parse errors, ensure your response_model matches the expected output schema exactly.
  • Check your API key and environment variables if no response is received.
  • Use smaller max_tokens or simpler prompts if the stream stalls.

Key Takeaways

  • Use stream=True with Instructor to get incremental structured outputs.
  • Define a precise pydantic.BaseModel as response_model for real-time parsing.
  • Instructor integrates seamlessly with OpenAI's Python SDK for streaming JSON responses.
  • Async streaming is supported with acreate and async for loops.
  • Validate your schema and environment setup to avoid streaming parse errors.
Verified 2026-04 · gpt-4o-mini
Verify ↗