Debug Fix intermediate · 3 min read

How to handle streaming chunks from Gemini API

Quick answer

Use the Gemini API client with the streaming parameter set to true and iterate over the response asynchronously to handle streaming chunks. The official Python SDK returns an async iterator for streamed completions, allowing you to process chunks as they arrive.

ERROR TYPE code_error

⚡ QUICK FIX

Use async iteration over the streaming response from the Gemini API client to handle chunks correctly.

Why this happens

Developers often try to handle streaming responses from the Gemini API as a single blocking call, which returns a full response instead of chunks. This happens because the Gemini API streaming endpoint returns an asynchronous stream of partial completions, not a single response object. Incorrect usage looks like calling a synchronous method or not iterating over the stream, resulting in missing or delayed output.

Example of incorrect code:

python

import os
from google.ai import generativelanguage

client = generativelanguage.TextServiceClient()

response = client.generate_text(
    model="gemini-1.5-flash",
    prompt="Hello",
    temperature=0.7,
    max_tokens=100,
    stream=True  # Streaming enabled
)

# Incorrect: trying to print response directly
print(response)  # This will not print streaming chunks as expected

output

TextServiceClientResponse object at 0x... (no streamed chunks printed)

The fix

Use asynchronous iteration over the streaming response to handle each chunk as it arrives. The Gemini API Python client returns an async generator when stream=True is set. You must async for over the response to process chunks in real time.

This approach allows you to display partial results immediately and build a responsive UI or logging system.

python

import os
import asyncio
from google.ai import generativelanguage

async def stream_gemini():
    client = generativelanguage.TextServiceClient()

    # Create the request with streaming enabled
    request = generativelanguage.GenerateTextRequest(
        model="gemini-1.5-flash",
        prompt=generativelanguage.TextPrompt(text="Hello, stream this text."),
        temperature=0.7,
        max_tokens=100,
        stream=True
    )

    # Async iterate over streaming chunks
    async for response in client.generate_text(request=request):
        chunk_text = response.candidates[0].output
        print(chunk_text, end='', flush=True)

if __name__ == "__main__":
    asyncio.run(stream_gemini())

output

Hello, stream this text. (printed progressively as chunks arrive)

Preventing it in production

Always use async iteration when stream=True to avoid blocking or missing chunks.
Implement exponential backoff and retry logic around streaming calls to handle transient network errors gracefully.
Validate chunk content and handle partial or empty chunks to maintain robustness.
Consider buffering chunks if you need to process or display them in batches.

Related errors

Error	Cause	Quick fix
No streaming chunks received	Not using async iteration over stream	Use async for loop over response
Stream hangs or times out	Network issues or no retry logic	Add retries with exponential backoff
Partial output missing	Not flushing output buffer	Use flush=True in print or buffer handling

✅

Key Takeaways

Use async iteration to handle Gemini API streaming responses correctly.
Enable streaming by setting stream=True in the request parameters.
Implement retries and validate chunks for production robustness.

Verified 2026-04 · gemini-1.5-flash

Verify ↗