How to stream Responses API output
Quick answer
Use the
OpenAI Python SDK's chat.completions.create method with stream=True to receive streamed output chunks. Iterate over the returned async or sync generator to process tokens as they arrive for real-time streaming.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the official openai Python package version 1.0 or higher and set your OpenAI API key as an environment variable.
- Install package:
pip install openai - Set environment variable in your shell:
export OPENAI_API_KEY='your_api_key'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key"(Windows)
pip install openai Step by step
This example demonstrates synchronous streaming of a chat completion using the gpt-4o-mini model. The stream=True parameter enables streaming, and the code iterates over chunks to print tokens as they arrive.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [
{"role": "user", "content": "Tell me a short story about a robot."}
]
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
stream=True
)
print("Streaming response:")
for chunk in stream:
token = chunk.choices[0].delta.content
if token:
print(token, end="", flush=True)
print() output
Streaming response: Once upon a time, there was a curious robot named R2 who dreamed of exploring the stars.
Common variations
You can also use asynchronous streaming with async for in an async function. Change the model to gpt-4o-mini or others as needed. Streaming works similarly for completions and chat completions.
import os
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def async_stream():
messages = [{"role": "user", "content": "Explain quantum computing in simple terms."}]
stream = await client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
stream=True
)
print("Async streaming response:")
async for chunk in stream:
token = chunk.choices[0].delta.content
if token:
print(token, end="", flush=True)
print()
if __name__ == "__main__":
asyncio.run(async_stream()) output
Async streaming response: Quantum computing is a new type of computing that uses quantum bits, or qubits, to solve problems faster than traditional computers.
Troubleshooting
- If streaming does not start, verify your API key is set correctly in
OPENAI_API_KEY. - If you get empty tokens, ensure your model supports streaming (e.g.,
gpt-4o-mini). - For network issues, check your internet connection and retry.
- Use
flush=Trueinprintto avoid buffering delays.
Key Takeaways
- Use
stream=Trueinchat.completions.createto enable streaming output. - Iterate over the returned generator to process tokens as they arrive for real-time display.
- Async streaming requires an
asyncfunction andasync foriteration. - Always set your API key in
os.environ["OPENAI_API_KEY"]for authentication. - Use models like
gpt-4o-minithat support streaming for best results.