How to stream structured outputs with Instructor
Quick answer
Use
Instructor with the stream=True parameter in the chat.completions.create method to receive structured outputs incrementally. Define a pydantic BaseModel as the response_model to parse streamed JSON data in real time.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0 instructor pydantic
Setup
Install the required packages and set your OpenAI API key as an environment variable.
- Install packages:
pip install openai instructor pydantic - Set environment variable in your shell:
export OPENAI_API_KEY='your_api_key'
pip install openai instructor pydantic Step by step
Define a pydantic.BaseModel for the structured output, then create an Instructor client from the OpenAI client. Use stream=True to receive partial structured responses as they arrive.
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel
# Define the structured output model
class UserInfo(BaseModel):
name: str
age: int
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Create Instructor client wrapping OpenAI
inst = instructor.from_openai(client)
# Prepare messages
messages = [{"role": "user", "content": "Extract user info: John is 30 years old."}]
# Stream structured output
stream = inst.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
response_model=UserInfo,
stream=True
)
print("Streaming structured output:")
for partial in stream:
# partial is a UserInfo instance with partial data
print(partial)
# Note: The final streamed object will have full parsed fields. output
Streaming structured output: UserInfo(name='J', age=None) UserInfo(name='John', age=None) UserInfo(name='John', age=3) UserInfo(name='John', age=30)
Common variations
You can use asynchronous streaming with async for loops, switch to different models like gpt-4o, or stream multiple structured objects by defining a list model. Instructor supports both OpenAI and Anthropic clients.
import asyncio
async def async_stream():
async for partial in inst.chat.completions.acreate(
model="gpt-4o-mini",
messages=messages,
response_model=UserInfo,
stream=True
):
print(partial)
asyncio.run(async_stream()) output
UserInfo(name='J', age=None) UserInfo(name='John', age=None) UserInfo(name='John', age=3) UserInfo(name='John', age=30)
Troubleshooting
- If streaming yields incomplete or invalid JSON parse errors, ensure your
response_modelmatches the expected output schema exactly. - Check your API key and environment variables if no response is received.
- Use smaller
max_tokensor simpler prompts if the stream stalls.
Key Takeaways
- Use
stream=TruewithInstructorto get incremental structured outputs. - Define a precise
pydantic.BaseModelasresponse_modelfor real-time parsing. - Instructor integrates seamlessly with OpenAI's Python SDK for streaming JSON responses.
- Async streaming is supported with
acreateandasync forloops. - Validate your schema and environment setup to avoid streaming parse errors.