High severity intermediate · Fix: 2-5 min

StopIteration or AttributeError on iterate_chunks()

google.generativeai.types.stream.AsyncGenerateContentResponse.iterate_chunks() / GenerateContentResponse.iterate_chunks()

What this error means

Gemini's stream response iterate_chunks() method fails when the stream is not properly initialized, already consumed, or called on a non-streaming response object.

Stack trace

traceback

Traceback (most recent call last):
  File "app.py", line 42, in <module>
    for chunk in response.iterate_chunks():
  File "google/generativeai/types/stream.py", line 156, in iterate_chunks
    raise StopIteration
StopIteration

OR

AttributeError: 'GenerateContentResponse' object has no attribute 'iterate_chunks'
  File "app.py", line 42, in <module>
    for chunk in response.iterate_chunks():

QUICK FIX

Add stream=True to your generate_content() call and ensure you're iterating with a for loop immediately after calling the method (do not store and reuse the stream object).

Why it happens

The iterate_chunks() method is only available on streaming responses created with stream=True in generate_content(). When you call generate_content() without stream=True, you get a static GenerateContentResponse object that doesn't have the iterate_chunks() method. Additionally, if the stream was not properly awaited (in async context) or if you try to iterate twice on the same exhausted stream, StopIteration is raised. The method also fails if the response object is None or the connection drops before chunks arrive.

Detection

Check the return type before calling iterate_chunks(): use isinstance(response, google.generativeai.types.stream.GenerateContentResponse) and verify stream=True was passed to generate_content(). Log the response object type before iteration to catch AttributeError early.

Causes & fixes

Called iterate_chunks() on a non-streaming response (generate_content() without stream=True)

✓ Fix

Add stream=True parameter to generate_content(): response = model.generate_content(prompt, stream=True)

Tried to iterate the same stream twice or after it was already exhausted

✓ Fix

Create a fresh streaming response for each iteration: collect chunks in first loop, then reuse the list. Do NOT iterate the same stream object twice.

Using async API without proper await, or mixing sync/async calls incorrectly

✓ Fix

For async, use: async with response: async for chunk in response: ... OR for sync use the blocking API without await. Never mix sync generate_content() with async iteration.

Response object is None due to API error, missing API key, or network failure before stream initialized

✓ Fix

Wrap generate_content() in try/except, check if response is not None, and verify GOOGLE_API_KEY is set via os.environ or genai.configure()

Code: broken vs fixed

Broken - triggers the error

python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')

prompt = 'Explain quantum computing in 500 words.'
response = model.generate_content(prompt)  # ❌ Missing stream=True

# This line will fail with AttributeError
for chunk in response.iterate_chunks():
    print(chunk.text)

Fixed - works correctly

python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')

prompt = 'Explain quantum computing in 500 words.'
response = model.generate_content(prompt, stream=True)  # ✅ Added stream=True

# Now iterate_chunks() is available
for chunk in response.iterate_chunks():
    if chunk.text:
        print(chunk.text, end='', flush=True)

print('\nStream completed successfully.')

Added stream=True parameter to generate_content() which returns a streaming response object with iterate_chunks() method available, and iterate immediately in the for loop to consume the stream correctly.

⚠

Workaround

If you cannot add stream=True immediately (legacy code constraints), collect the response text and manually split it into chunks: response = model.generate_content(prompt); chunks = response.text.split(' '); process each chunk. This loses real-time streaming benefit but allows chunk-by-chunk processing.

✓

Prevention

Always enable streaming at the API call site if you need chunk-level access: use stream=True by default for LLM responses you'll process incrementally. Create a helper function that enforces streaming: def stream_generate(model, prompt): return model.generate_content(prompt, stream=True). This prevents non-streaming calls from sneaking into production. Test stream iteration explicitly in unit tests to catch iterate_chunks() failures before deployment.

Python 3.8+ · google-generativeai >=0.3.0 · tested on 0.7.x

Verified 2026-04 · gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.