Debug Fix intermediate · 3 min read

How to handle streaming chunks from Gemini API

Quick answer
Use the Gemini API client with the streaming parameter set to true and iterate over the response asynchronously to handle streaming chunks. The official Python SDK returns an async iterator for streamed completions, allowing you to process chunks as they arrive.
ERROR TYPE code_error
⚡ QUICK FIX
Use async iteration over the streaming response from the Gemini API client to handle chunks correctly.

Why this happens

Developers often try to handle streaming responses from the Gemini API as a single blocking call, which returns a full response instead of chunks. This happens because the Gemini API streaming endpoint returns an asynchronous stream of partial completions, not a single response object. Incorrect usage looks like calling a synchronous method or not iterating over the stream, resulting in missing or delayed output.

Example of incorrect code:

python
import os
from google.ai import generativelanguage

client = generativelanguage.TextServiceClient()

response = client.generate_text(
    model="gemini-1.5-flash",
    prompt="Hello",
    temperature=0.7,
    max_tokens=100,
    stream=True  # Streaming enabled
)

# Incorrect: trying to print response directly
print(response)  # This will not print streaming chunks as expected
output
TextServiceClientResponse object at 0x... (no streamed chunks printed)

The fix

Use asynchronous iteration over the streaming response to handle each chunk as it arrives. The Gemini API Python client returns an async generator when stream=True is set. You must async for over the response to process chunks in real time.

This approach allows you to display partial results immediately and build a responsive UI or logging system.

python
import os
import asyncio
from google.ai import generativelanguage

async def stream_gemini():
    client = generativelanguage.TextServiceClient()

    # Create the request with streaming enabled
    request = generativelanguage.GenerateTextRequest(
        model="gemini-1.5-flash",
        prompt=generativelanguage.TextPrompt(text="Hello, stream this text."),
        temperature=0.7,
        max_tokens=100,
        stream=True
    )

    # Async iterate over streaming chunks
    async for response in client.generate_text(request=request):
        chunk_text = response.candidates[0].output
        print(chunk_text, end='', flush=True)

if __name__ == "__main__":
    asyncio.run(stream_gemini())
output
Hello, stream this text. (printed progressively as chunks arrive)

Preventing it in production

  • Always use async iteration when stream=True to avoid blocking or missing chunks.
  • Implement exponential backoff and retry logic around streaming calls to handle transient network errors gracefully.
  • Validate chunk content and handle partial or empty chunks to maintain robustness.
  • Consider buffering chunks if you need to process or display them in batches.

Key Takeaways

  • Use async iteration to handle Gemini API streaming responses correctly.
  • Enable streaming by setting stream=True in the request parameters.
  • Implement retries and validate chunks for production robustness.
Verified 2026-04 · gemini-1.5-flash
Verify ↗