How to beginner · 3 min read

How to use streaming with OpenAI chat completions

Q: How to use streaming with OpenAI chat completions

Use the OpenAI SDK's chat.completions.create method with the stream=True parameter to receive tokens incrementally. Iterate over the response to process tokens as they arrive for real-time streaming output.

Quick answer

Use the OpenAI SDK's chat.completions.create method with the stream=True parameter to receive tokens incrementally. Iterate over the response to process tokens as they arrive for real-time streaming output.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

Install SDK: pip install openai
Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install openai

Step by step

This example demonstrates streaming chat completions using the gpt-4o model. The code prints tokens as they stream in.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem about AI."}],
    stream=True
)

print("Streaming response:")
for chunk in response:
    # Each chunk contains partial message content
    delta = chunk.choices[0].delta
    if "content" in delta:
        print(delta["content"], end="", flush=True)
print()

output

Streaming response:
Write a short poem about AI.
(prints tokens as they arrive, e.g., "In circuits deep, where data flows..." streamed token by token)

Common variations

Async streaming: Use async iteration with async for in an async function.
Different models: Replace model="gpt-4o" with any supported chat model like gpt-4o-mini.
Non-streaming: Omit stream=True to get the full response at once.

python

import asyncio
import os
from openai import OpenAI

async def async_stream():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Tell me a joke."}],
        stream=True
    )
    print("Async streaming response:")
    async for chunk in response:
        delta = chunk.choices[0].delta
        if "content" in delta:
            print(delta["content"], end="", flush=True)
    print()

asyncio.run(async_stream())

output

Async streaming response:
Why did the AI cross the road? To get to the other algorithm!
(printed token by token asynchronously)

Troubleshooting

No output during streaming: Ensure your terminal supports flushing and you are iterating over the response correctly.
API key errors: Verify OPENAI_API_KEY is set correctly in your environment.
Timeouts or disconnects: Check your network connection and retry the request.

✅

Key Takeaways

Use stream=True in chat.completions.create to enable streaming.
Iterate over the response object to receive tokens incrementally in real time.
Async streaming is supported with acreate and async for iteration.
Always load your API key from os.environ for security and best practice.
Streaming improves responsiveness for chat applications by delivering tokens as they generate.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗