Debug Fix intermediate · 3 min read

How to handle streaming errors

Quick answer
Handle streaming errors in OpenAI API calls by wrapping the streaming loop in a try-except block to catch exceptions like APIError or ConnectionError. Implement exponential backoff retries to recover from transient network or rate limit issues during streaming.
ERROR TYPE api_error
⚡ QUICK FIX
Add exponential backoff retry logic around your API call to handle RateLimitError automatically.

Why this happens

Streaming errors occur due to network interruptions, API rate limits, or server-side issues during the stream=True chat completion calls. For example, a broken connection or a RateLimitError can cause the streaming generator to raise exceptions, abruptly stopping the data flow.

Typical broken code looks like this, where no error handling is applied:

python
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)
output
Hello, how can I assist you today?

The fix

Wrap the streaming call in a try-except block and implement exponential backoff retries to handle transient errors gracefully. This approach retries the streaming request after increasing delays, preventing immediate failure on recoverable errors.

This code snippet retries up to 3 times with backoff delays:

python
import time
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

max_retries = 3
retry_delay = 1  # seconds

for attempt in range(max_retries):
    try:
        stream = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "Hello"}],
            stream=True
        )
        for chunk in stream:
            print(chunk.choices[0].delta.content or "", end="", flush=True)
        break  # success, exit retry loop
    except Exception as e:
        print(f"Streaming error: {e}")
        if attempt < max_retries - 1:
            time.sleep(retry_delay)
            retry_delay *= 2  # exponential backoff
        else:
            print("Max retries reached. Streaming failed.")
output
Hello, how can I assist you today?

Preventing it in production

  • Use robust retry logic with exponential backoff and jitter to avoid hammering the API during outages or rate limits.
  • Validate API keys and network connectivity before streaming calls.
  • Implement fallback mechanisms such as switching to non-streaming completions or cached responses if streaming repeatedly fails.
  • Monitor error rates and alert on spikes to proactively address API or network issues.

Key Takeaways

  • Always wrap streaming calls in try-except blocks to catch runtime errors.
  • Implement exponential backoff retries to handle transient streaming failures.
  • Monitor and alert on streaming error rates to maintain production reliability.
Verified 2026-04 · gpt-4o-mini
Verify ↗