Debug Fix beginner · 3 min read

How to handle streaming chunks from OpenAI

Q: How to handle streaming chunks from OpenAI

Use the OpenAI Python SDK v1+ with the stream=True parameter in client.chat.completions.create(). Iterate over the returned async or sync generator to process each chunk's delta.content incrementally for real-time streaming output.

Quick answer

Use the OpenAI Python SDK v1+ with the stream=True parameter in client.chat.completions.create(). Iterate over the returned async or sync generator to process each chunk's delta.content incrementally for real-time streaming output.

ERROR TYPE code_error

⚡ QUICK FIX

Use a for-loop to iterate over the streaming response and extract chunk.choices[0].delta.content safely, handling empty chunks.

Why this happens

Developers often try to handle streaming responses from OpenAI by treating the response as a single object instead of an iterable stream. This leads to errors or missing partial outputs. For example, calling response.choices[0].message.content directly on a streaming response will fail because the response is a generator yielding chunks, not a complete message.

Typical broken code:

python

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

# Incorrect: trying to access full content directly
print(response.choices[0].message.content)

output

AttributeError: 'generator' object has no attribute 'choices'

The fix

Use a for-loop to iterate over the streaming response. Each chunk contains partial content in chunk.choices[0].delta.content. Append or print these chunks incrementally to reconstruct the full message as it streams.

This works because the SDK returns an iterable stream of partial completions, allowing real-time processing.

python

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

full_response = ""
for chunk in response:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
        full_response += delta

print("\nFull response:", full_response)

output

Hello
Full response: Hello

Preventing it in production

Implement robust retry logic with exponential backoff to handle transient network or rate limit errors during streaming. Validate that each chunk contains delta.content before processing to avoid NoneType errors. Consider buffering partial chunks if you need to process or store the full response after streaming.

Also, handle stream termination gracefully by checking for finish_reason in the final chunk.

Related errors

Error	Cause	Quick fix
AttributeError: 'generator' object has no attribute 'choices'	Accessing streaming response as a single object	Iterate over the response generator with a for-loop
TypeError: 'NoneType' object is not subscriptable	Chunk missing delta.content field	Check if delta.content is not None before using it
RateLimitError	Too many requests in short time	Add exponential backoff retry logic around API calls

✅

Key Takeaways

Always iterate over the streaming response to handle partial chunks correctly.
Extract partial content from chunk.choices[0].delta.content safely to build the full output.
Implement retries and validate chunk content to ensure robust streaming in production.

Verified 2026-04 · gpt-4o-mini

Verify ↗