How to beginner · 3 min read

Fix streaming response cut off

Quick answer
To fix streaming response cut off with the OpenAI Python SDK, always iterate over the streaming generator fully and concatenate chunk.choices[0].delta.content. Avoid premature breaks and flush output properly. Use stream=True and process all chunks to get the complete streamed text.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official openai Python package version 1.0 or higher and set your API key as an environment variable.

  • Install package: pip install openai
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (50 kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use the OpenAI SDK's streaming feature correctly by iterating over the response generator fully and concatenating the content from each chunk. This prevents cut off caused by incomplete iteration or premature termination.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [{"role": "user", "content": "Explain the benefits of streaming responses."}]

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    stream=True
)

full_response = ""
for chunk in stream:
    delta = chunk.choices[0].delta.get("content", "")
    if delta:
        print(delta, end='', flush=True)  # Optional: real-time output
        full_response += delta

print("\nFull response received:")
print(full_response)
output
Explain the benefits of streaming responses.
Full response received:
Explain the benefits of streaming responses. Streaming allows partial results to be received immediately, reducing latency and improving user experience.

Common variations

You can use async streaming with the OpenAI SDK by iterating asynchronously over the stream. Also, you can switch models like gpt-4o-mini or use other providers with similar streaming patterns.

python
import asyncio
import os
from openai import OpenAI

async def async_stream():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Tell me a joke."}]
    stream = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=True
    )
    full_response = ""
    async for chunk in stream:
        delta = chunk.choices[0].delta.get("content", "")
        if delta:
            print(delta, end='', flush=True)
            full_response += delta
    print("\nFull async response:")
    print(full_response)

asyncio.run(async_stream())
output
Why did the scarecrow win an award? Because he was outstanding in his field!
Full async response:
Why did the scarecrow win an award? Because he was outstanding in his field!

Troubleshooting

  • Cut off responses: Ensure you iterate over the entire stream and do not break early.
  • No output: Check your API key and network connectivity.
  • Partial output in console: Use flush=True in print() to force immediate output.

Key Takeaways

  • Always fully iterate over the streaming generator to avoid cut off.
  • Concatenate chunk.choices[0].delta.content for complete streamed text.
  • Use flush=True in print statements for real-time output.
  • Async streaming requires async iteration over the response.
  • Verify API key and network if streaming yields no or partial output.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗