How to beginner · 3 min read

How to stream a run in OpenAI Assistants API

Q: How to stream a run in OpenAI Assistants API

Use the OpenAI SDK's client.chat.completions.create method with the stream=True parameter to stream a run in the OpenAI Assistants API. Iterate over the response generator to receive tokens or partial messages in real time.

Quick answer

Use the OpenAI SDK's client.chat.completions.create method with the stream=True parameter to stream a run in the OpenAI Assistants API. Iterate over the response generator to receive tokens or partial messages in real time.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official openai Python SDK version 1.0 or higher and set your API key as an environment variable.

Install SDK: pip install openai>=1.0
Set environment variable in your shell: export OPENAI_API_KEY='your_api_key_here'

bash

pip install openai>=1.0

Step by step

This example demonstrates streaming a chat completion from the OpenAI Assistants API using the gpt-4o model. The stream=True parameter enables real-time token streaming. The code prints tokens as they arrive.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a short story."}],
    stream=True
)

print("Streaming response:")
for chunk in response:
    # Each chunk is a dict with choices containing delta tokens
    delta = chunk.choices[0].delta
    if "content" in delta:
        print(delta["content"], end="", flush=True)
print()

output

Streaming response:
Once upon a time, in a quiet village, there lived a curious cat named Whiskers...

Common variations

Async streaming: Use an async client and async for to stream asynchronously.
Different models: Replace gpt-4o with other supported models like gpt-4.1 or gpt-4o-mini.
Non-streaming: Omit stream=True to get the full response at once.

python

import asyncio
import os
from openai import OpenAI

async def async_stream():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Stream a poem."}],
        stream=True
    )
    print("Async streaming response:")
    async for chunk in response:
        delta = chunk.choices[0].delta
        if "content" in delta:
            print(delta["content"], end="", flush=True)
    print()

asyncio.run(async_stream())

output

Async streaming response:
Roses are red, violets are blue, streaming this poem, just for you...

Troubleshooting

If streaming hangs or returns no data, verify your API key and network connectivity.
Ensure stream=True is set; otherwise, streaming won't activate.
Check for SDK version compatibility; upgrade with pip install --upgrade openai.

Key Takeaways

Use stream=True in client.chat.completions.create to enable streaming.
Iterate over the response generator to receive tokens in real time.
Async streaming requires acreate and async for syntax.
Always keep your SDK updated to avoid compatibility issues.
Set your API key securely via environment variables.

Verified 2026-04 · gpt-4o, gpt-4.1, gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.