How to beginner · 3 min read

How to stream Responses API output

Q: How to stream Responses API output

Use the OpenAI Python SDK's chat.completions.create method with stream=True to receive streamed output chunks. Iterate over the returned async or sync generator to process tokens as they arrive for real-time streaming.

Quick answer

Use the OpenAI Python SDK's chat.completions.create method with stream=True to receive streamed output chunks. Iterate over the returned async or sync generator to process tokens as they arrive for real-time streaming.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official openai Python package version 1.0 or higher and set your OpenAI API key as an environment variable.

Install package: pip install openai
Set environment variable in your shell: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install openai

Step by step

This example demonstrates synchronous streaming of a chat completion using the gpt-4o-mini model. The stream=True parameter enables streaming, and the code iterates over chunks to print tokens as they arrive.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [
    {"role": "user", "content": "Tell me a short story about a robot."}
]

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    stream=True
)

print("Streaming response:")
for chunk in stream:
    token = chunk.choices[0].delta.content
    if token:
        print(token, end="", flush=True)
print()

output

Streaming response:
Once upon a time, there was a curious robot named R2 who dreamed of exploring the stars.

Common variations

You can also use asynchronous streaming with async for in an async function. Change the model to gpt-4o-mini or others as needed. Streaming works similarly for completions and chat completions.

python

import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def async_stream():
    messages = [{"role": "user", "content": "Explain quantum computing in simple terms."}]
    stream = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=True
    )
    print("Async streaming response:")
    async for chunk in stream:
        token = chunk.choices[0].delta.content
        if token:
            print(token, end="", flush=True)
    print()

if __name__ == "__main__":
    asyncio.run(async_stream())

output

Async streaming response:
Quantum computing is a new type of computing that uses quantum bits, or qubits, to solve problems faster than traditional computers.

Troubleshooting

If streaming does not start, verify your API key is set correctly in OPENAI_API_KEY.
If you get empty tokens, ensure your model supports streaming (e.g., gpt-4o-mini).
For network issues, check your internet connection and retry.
Use flush=True in print to avoid buffering delays.

✅

Key Takeaways

Use stream=True in chat.completions.create to enable streaming output.
Iterate over the returned generator to process tokens as they arrive for real-time display.
Async streaming requires an async function and async for iteration.
Always set your API key in os.environ["OPENAI_API_KEY"] for authentication.
Use models like gpt-4o-mini that support streaming for best results.

Verified 2026-04 · gpt-4o-mini

Verify ↗