How to beginner · 3 min read

How to stream a run in OpenAI Assistants API

Quick answer
Use the OpenAI SDK's client.chat.completions.create method with the stream=True parameter to stream a run in the OpenAI Assistants API. Iterate over the response generator to receive tokens or partial messages in real time.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official openai Python SDK version 1.0 or higher and set your API key as an environment variable.

  • Install SDK: pip install openai>=1.0
  • Set environment variable in your shell: export OPENAI_API_KEY='your_api_key_here'
bash
pip install openai>=1.0

Step by step

This example demonstrates streaming a chat completion from the OpenAI Assistants API using the gpt-4o model. The stream=True parameter enables real-time token streaming. The code prints tokens as they arrive.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a short story."}],
    stream=True
)

print("Streaming response:")
for chunk in response:
    # Each chunk is a dict with choices containing delta tokens
    delta = chunk.choices[0].delta
    if "content" in delta:
        print(delta["content"], end="", flush=True)
print()
output
Streaming response:
Once upon a time, in a quiet village, there lived a curious cat named Whiskers...

Common variations

  • Async streaming: Use an async client and async for to stream asynchronously.
  • Different models: Replace gpt-4o with other supported models like gpt-4.1 or gpt-4o-mini.
  • Non-streaming: Omit stream=True to get the full response at once.
python
import asyncio
import os
from openai import OpenAI

async def async_stream():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Stream a poem."}],
        stream=True
    )
    print("Async streaming response:")
    async for chunk in response:
        delta = chunk.choices[0].delta
        if "content" in delta:
            print(delta["content"], end="", flush=True)
    print()

asyncio.run(async_stream())
output
Async streaming response:
Roses are red, violets are blue, streaming this poem, just for you...

Troubleshooting

  • If streaming hangs or returns no data, verify your API key and network connectivity.
  • Ensure stream=True is set; otherwise, streaming won't activate.
  • Check for SDK version compatibility; upgrade with pip install --upgrade openai.

Key Takeaways

  • Use stream=True in client.chat.completions.create to enable streaming.
  • Iterate over the response generator to receive tokens in real time.
  • Async streaming requires acreate and async for syntax.
  • Always keep your SDK updated to avoid compatibility issues.
  • Set your API key securely via environment variables.
Verified 2026-04 · gpt-4o, gpt-4.1, gpt-4o-mini
Verify ↗