How to Intermediate · 3 min read

How to stream structured outputs with Instructor

Q: How to stream structured outputs with Instructor

Use Instructor with the stream=True parameter in the chat.completions.create method to receive structured outputs incrementally. Define a pydantic BaseModel as the response_model to parse streamed JSON data in real time.

Quick answer

Use Instructor with the stream=True parameter in the chat.completions.create method to receive structured outputs incrementally. Define a pydantic BaseModel as the response_model to parse streamed JSON data in real time.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 instructor pydantic

Setup

Install the required packages and set your OpenAI API key as an environment variable.

Install packages: pip install openai instructor pydantic
Set environment variable in your shell: export OPENAI_API_KEY='your_api_key'

bash

pip install openai instructor pydantic

Step by step

Define a pydantic.BaseModel for the structured output, then create an Instructor client from the OpenAI client. Use stream=True to receive partial structured responses as they arrive.

python

import os
from openai import OpenAI
import instructor
from pydantic import BaseModel

# Define the structured output model
class UserInfo(BaseModel):
    name: str
    age: int

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Create Instructor client wrapping OpenAI
inst = instructor.from_openai(client)

# Prepare messages
messages = [{"role": "user", "content": "Extract user info: John is 30 years old."}]

# Stream structured output
stream = inst.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    response_model=UserInfo,
    stream=True
)

print("Streaming structured output:")
for partial in stream:
    # partial is a UserInfo instance with partial data
    print(partial)

# Note: The final streamed object will have full parsed fields.

output

Streaming structured output:
UserInfo(name='J', age=None)
UserInfo(name='John', age=None)
UserInfo(name='John', age=3)
UserInfo(name='John', age=30)

Common variations

You can use asynchronous streaming with async for loops, switch to different models like gpt-4o, or stream multiple structured objects by defining a list model. Instructor supports both OpenAI and Anthropic clients.

python

import asyncio

async def async_stream():
    async for partial in inst.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=messages,
        response_model=UserInfo,
        stream=True
    ):
        print(partial)

asyncio.run(async_stream())

output

UserInfo(name='J', age=None)
UserInfo(name='John', age=None)
UserInfo(name='John', age=3)
UserInfo(name='John', age=30)

Troubleshooting

If streaming yields incomplete or invalid JSON parse errors, ensure your response_model matches the expected output schema exactly.
Check your API key and environment variables if no response is received.
Use smaller max_tokens or simpler prompts if the stream stalls.

✅

Key Takeaways

Use stream=True with Instructor to get incremental structured outputs.
Define a precise pydantic.BaseModel as response_model for real-time parsing.
Instructor integrates seamlessly with OpenAI's Python SDK for streaming JSON responses.
Async streaming is supported with acreate and async for loops.
Validate your schema and environment setup to avoid streaming parse errors.

Verified 2026-04 · gpt-4o-mini

Verify ↗