How to beginner · 3 min read

How to use streaming with OpenAI chat completions

Quick answer
Use the OpenAI SDK's chat.completions.create method with the stream=True parameter to receive tokens incrementally. Iterate over the response to process tokens as they arrive for real-time streaming output.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

  • Install SDK: pip install openai
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai

Step by step

This example demonstrates streaming chat completions using the gpt-4o model. The code prints tokens as they stream in.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem about AI."}],
    stream=True
)

print("Streaming response:")
for chunk in response:
    # Each chunk contains partial message content
    delta = chunk.choices[0].delta
    if "content" in delta:
        print(delta["content"], end="", flush=True)
print()
output
Streaming response:
Write a short poem about AI.
(prints tokens as they arrive, e.g., "In circuits deep, where data flows..." streamed token by token)

Common variations

  • Async streaming: Use async iteration with async for in an async function.
  • Different models: Replace model="gpt-4o" with any supported chat model like gpt-4o-mini.
  • Non-streaming: Omit stream=True to get the full response at once.
python
import asyncio
import os
from openai import OpenAI

async def async_stream():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Tell me a joke."}],
        stream=True
    )
    print("Async streaming response:")
    async for chunk in response:
        delta = chunk.choices[0].delta
        if "content" in delta:
            print(delta["content"], end="", flush=True)
    print()

asyncio.run(async_stream())
output
Async streaming response:
Why did the AI cross the road? To get to the other algorithm!
(printed token by token asynchronously)

Troubleshooting

  • No output during streaming: Ensure your terminal supports flushing and you are iterating over the response correctly.
  • API key errors: Verify OPENAI_API_KEY is set correctly in your environment.
  • Timeouts or disconnects: Check your network connection and retry the request.

Key Takeaways

  • Use stream=True in chat.completions.create to enable streaming.
  • Iterate over the response object to receive tokens incrementally in real time.
  • Async streaming is supported with acreate and async for iteration.
  • Always load your API key from os.environ for security and best practice.
  • Streaming improves responsiveness for chat applications by delivering tokens as they generate.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗