Debug Fix intermediate · 3 min read

How to handle streaming events from Claude API

Quick answer
Use the stream=True parameter in client.messages.create() to receive streaming responses from the Claude API. Iterate over the returned generator to process partial events as they arrive, enabling real-time handling of the assistant's output.
ERROR TYPE code_error
⚡ QUICK FIX
Add the stream=True parameter and iterate over the response generator to handle streaming events properly.

Why this happens

Developers often try to handle streaming from the Claude API by calling client.messages.create() without the stream=True parameter or by expecting a single response object. This results in receiving the entire completion at once rather than incremental events. For example, the following code blocks streaming behavior:

import anthropic
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=500,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Hello, stream this response."}]
)
print(response.content[0].text)

This code waits until the full response is generated before returning, so no streaming events are handled.

python
import anthropic
import os

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=500,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Hello, stream this response."}]
)
print(response.content[0].text)
output
Hello, stream this response. (full text output after completion)

The fix

To handle streaming events, add stream=True to client.messages.create(). This returns a generator that yields partial response chunks as they arrive. Iterate over this generator to process each event in real time. This approach reduces latency and enables UI updates or token-by-token processing.

python
import anthropic
import os

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

stream = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=500,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Hello, stream this response."}],
    stream=True
)

for event in stream:
    # Each event is a partial completion chunk
    print(event.text, end='', flush=True)

print()  # Newline after streaming completes
output
Hello, stream this response. (printed incrementally as tokens arrive)

Preventing it in production

Implement robust retry logic with exponential backoff to handle transient network or rate limit errors during streaming. Validate that stream=True is set when streaming is desired. Use timeouts and cancellation to avoid hanging streams. Consider fallback to non-streaming calls if streaming fails. Monitor streaming latency and partial event integrity to ensure smooth user experience.

Key Takeaways

  • Always set stream=True in client.messages.create() to enable streaming from Claude API.
  • Iterate over the returned generator to process partial completion events in real time.
  • Implement retries and timeouts to handle network issues during streaming.
  • Streaming reduces latency and improves user experience by delivering tokens incrementally.
  • Validate streaming setup in testing before deploying to production.
Verified 2026-04 · claude-3-5-sonnet-20241022
Verify ↗