How to beginner · 3 min read

OpenAI Assistants API streaming Python example

Q: OpenAI Assistants API streaming Python example

Use the OpenAI SDK's chat.completions.create method with stream=True to receive streaming responses from the OpenAI Assistants API in Python. Iterate over the streamed chunks to process partial outputs in real time.

Quick answer

Use the OpenAI SDK's chat.completions.create method with stream=True to receive streaming responses from the OpenAI Assistants API in Python. Iterate over the streamed chunks to process partial outputs in real time.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official openai Python SDK version 1.0 or higher and set your OpenAI API key as an environment variable.

Install SDK: pip install openai>=1.0
Set environment variable in your shell: export OPENAI_API_KEY='your_api_key_here'

bash

pip install openai>=1.0

Step by step

This example demonstrates how to call the OpenAI Assistants API with streaming enabled. It prints partial responses as they arrive.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem about spring."}],
    stream=True
)

print("Streaming response:")
for chunk in response:
    if hasattr(chunk, "choices") and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if "content" in delta:
            print(delta["content"], end="", flush=True)
print()

output

Streaming response:
Spring blooms awake,
Soft winds gently shake,
Colors dance and play,
Welcoming the day.

Common variations

Async streaming: Use async for with await client.chat.completions.acreate(..., stream=True).
Different models: Replace model="gpt-4o" with other OpenAI models like gpt-4.1 or gpt-4o-mini.
Non-streaming: Omit stream=True to get the full response at once.

python

import asyncio
import os
from openai import OpenAI

async def async_stream():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Tell me a joke."}],
        stream=True
    )
    print("Async streaming response:")
    async for chunk in response:
        if hasattr(chunk, "choices") and len(chunk.choices) > 0:
            delta = chunk.choices[0].delta
            if "content" in delta:
                print(delta["content"], end="", flush=True)
    print()

asyncio.run(async_stream())

output

Async streaming response:
Why did the scarecrow win an award? Because he was outstanding in his field!

Troubleshooting

If you get an authentication error, verify your OPENAI_API_KEY environment variable is set correctly.
For network timeouts, check your internet connection and retry.
If streaming yields no output, ensure your terminal supports flushing and that stream=True is set.

✅

Key Takeaways

Use stream=True in chat.completions.create to enable streaming responses.
Iterate over the response object to process partial content chunks in real time.
Async streaming is supported via acreate and async for iteration.
Always load your API key securely from os.environ.
Switch models easily by changing the model parameter.

Verified 2026-04 · gpt-4o, gpt-4.1, gpt-4o-mini

Verify ↗