ValueError: Invalid content format
ValueError (from google.generativeai.types.content_types module)
Stack trace
ValueError: Invalid content format
File "/path/to/site-packages/google/generativeai/types/content_types.py", line X, in _validate_content
raise ValueError("Invalid content format")
ValueError: Content must be a string, Part, Content, or iterable of Part or string objects. Received: <class 'list'> with invalid structure. Why it happens
Gemini's generate_content() expects multimodal content (images + text) to follow a specific order: image Parts must come before or be properly interleaved with text Parts, and all items in a list must be valid Part or string objects. When you pass a list with Parts in the wrong sequence, missing proper Part wrapping, or mixed incompatible types, the API's content validator rejects the input. This is stricter than older vision APIs because Gemini processes images and text in a specific pipeline order.
Detection
Test your multimodal prompts with a simple image+text combination before complex chains. Log the content structure before calling generate_content() to catch ordering issues early. Use type hints to ensure Parts are created correctly.
Causes & fixes
Passing raw image bytes or PIL Image without wrapping in a Part object within a list
Wrap images in Part objects using Part.from_data() or Part(inline_data=...) before adding to the content list. Example: Part.from_data(data=image_bytes, mime_type='image/jpeg')
Image Part comes after text Part in the list, violating expected content order
Reorder list items so all image Parts appear before text Parts. Correct order: [image_part, text_part] not [text_part, image_part]
Mixing raw strings with Part objects in a single list without consistent wrapping
Either use all strings (Gemini auto-wraps them) or convert all to Parts explicitly. Don't mix: use either [text_string, text_string] or [Part(...), Part(...)]
Passing nested lists or dicts instead of flat list of Part/string objects
Flatten your content structure. Gemini expects: [Part, Part, string] not [[Part, Part], {string}]. Use itertools.chain() or list comprehension to flatten if building from nested sources
Code: broken vs fixed
import google.generativeai as genai
import os
from PIL import Image
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')
# BROKEN: mixing raw image and text, wrong order
image = Image.open('photo.jpg')
response = model.generate_content([
'Describe this image', # Text BEFORE image
image # Raw PIL Image, not wrapped in Part
]) # ValueError: Invalid content format
print(response.text) import google.generativeai as genai
import os
from PIL import Image
from google.generativeai.types import Part
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')
# FIXED: image Part first, then text; proper Part wrapping
image = Image.open('photo.jpg')
# Convert image to bytes for Part.from_data()
with open('photo.jpg', 'rb') as f:
image_bytes = f.read()
response = model.generate_content([
Part.from_data(data=image_bytes, mime_type='image/jpeg'), # Image Part first
'Describe this image' # Text Part second
])
print(response.text) Workaround
If you cannot refactor to use Part objects, pass the image and text as separate sequential calls: first send the image with a simple prompt, capture the response, then use that context in follow-up text-only calls. This avoids list ordering issues at the cost of extra API calls.
Prevention
Always use Part.from_data() or Part(inline_data=genai.types.BlobPropertyName(...)) for images in multimodal lists. Document your content list order: images first, then text. Add a helper function that validates content structure before passing to generate_content(), checking for proper Part wrapping and order. Use type hints: content: list[Part | str] to enforce structure at code review time.