How to send image to Gemini API in python
Direct answer
Use the Gemini API Python client to send images by encoding the image as base64 and including it in the messages array with the appropriate content type in
client.generate_message.Setup
Install
pip install google-ai-generativelanguage Env vars
GOOGLE_API_KEY Imports
import os
from google.ai import generativelanguage
import base64 Examples
inSend a PNG image file to Gemini for caption generation.
outResponse with a descriptive caption of the image.
inSend a JPEG image to Gemini with a prompt to analyze the image content.
outResponse with detailed analysis or description of the image content.
inSend an empty or corrupted image file to Gemini.
outAPI returns an error indicating invalid image input.
Integration steps
- Install the Google Generative Language SDK and set the GOOGLE_API_KEY environment variable.
- Read the image file in binary mode and encode it to base64.
- Initialize the Gemini client with the API key from os.environ.
- Create a chat completion request including the base64-encoded image in the messages array with the 'image_url' or 'image_base64' content type.
- Send the request and receive the response from the Gemini API.
- Extract and use the text or analysis from the response.
Full code
import os
import base64
from google.ai import generativelanguage
# Initialize the Gemini client
client = generativelanguage.GenerationServiceClient()
# Load and encode image as base64
image_path = "./example_image.png"
with open(image_path, "rb") as img_file:
image_bytes = img_file.read()
image_b64 = base64.b64encode(image_bytes).decode('utf-8')
# Prepare the message with the image encoded in base64
messages = [
{
"author": "user",
"content": {
"image_base64": image_b64,
"mime_type": "image/png",
"text": "Describe this image."
}
}
]
# Create the chat completion request
response = client.generate_message(
model="gemini-1.5-flash",
prompt={"messages": messages},
temperature=0.7
)
# Print the response content
print("Gemini response:", response.candidates[0].content) API trace
Request
{"model": "gemini-1.5-flash", "prompt": {"messages": [{"author": "user", "content": {"image_base64": "<base64-encoded-image>", "mime_type": "image/png", "text": "Describe this image."}}]}, "temperature": 0.7} Response
{"candidates": [{"content": "This image shows a scenic mountain landscape with a clear blue sky and lush green trees."}], "metadata": {...}} Extract
response.candidates[0].contentVariants
Streaming image analysis response ›
Use streaming when you want partial results as the model generates them for better user experience.
import os
import base64
from google.ai import generativelanguage
client = generativelanguage.GenerationServiceClient()
image_path = "./example_image.png"
with open(image_path, "rb") as img_file:
image_bytes = img_file.read()
image_b64 = base64.b64encode(image_bytes).decode('utf-8')
messages = [
{
"author": "user",
"content": {
"image_base64": image_b64,
"mime_type": "image/png",
"text": "Describe this image in detail."
}
}
]
stream = client.generate_message_stream(
model="gemini-1.5-flash",
prompt={"messages": messages},
temperature=0.7
)
for response in stream:
print(response.candidates[0].content, end='', flush=True) Async image send with Gemini ›
Use async when integrating Gemini calls into an async Python application for concurrency.
import os
import base64
import asyncio
from google.ai import generativelanguage
async def send_image_async():
client = generativelanguage.GenerationServiceAsyncClient()
image_path = "./example_image.png"
with open(image_path, "rb") as img_file:
image_bytes = img_file.read()
image_b64 = base64.b64encode(image_bytes).decode('utf-8')
messages = [
{
"author": "user",
"content": {
"image_base64": image_b64,
"mime_type": "image/png",
"text": "Analyze this image."
}
}
]
response = await client.generate_message(
model="gemini-1.5-flash",
prompt={"messages": messages},
temperature=0.7
)
print("Async Gemini response:", response.candidates[0].content)
asyncio.run(send_image_async()) Use gemini-2.0-flash for higher quality image understanding ›
Use the gemini-2.0-flash model for more advanced image understanding and detailed responses.
import os
import base64
from google.ai import generativelanguage
client = generativelanguage.GenerationServiceClient()
image_path = "./example_image.png"
with open(image_path, "rb") as img_file:
image_bytes = img_file.read()
image_b64 = base64.b64encode(image_bytes).decode('utf-8')
messages = [
{
"author": "user",
"content": {
"image_base64": image_b64,
"mime_type": "image/png",
"text": "Provide a detailed description of this image."
}
}
]
response = client.generate_message(
model="gemini-2.0-flash",
prompt={"messages": messages},
temperature=0.5
)
print("Gemini 2.0 response:", response.candidates[0].content) Performance
Latency~1.2s for gemini-1.5-flash image requests
Cost~$0.015 per 1k tokens plus image processing fees (check official pricing)
Rate limitsDefault tier: 300 RPM / 10K TPM
- Keep image captions or prompts concise to reduce token usage.
- Use lower temperature for deterministic outputs to avoid extra tokens.
- Batch multiple images in one request if supported to save overhead.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard sync call | ~1.2s | ~$0.015 | Simple image captioning |
| Streaming response | ~1.2s initial + stream | ~$0.015 | Interactive UI with partial results |
| Async call | ~1.2s | ~$0.015 | Concurrent image processing in async apps |
| gemini-2.0-flash model | ~1.5s | ~$0.025 | High-quality detailed image analysis |
Quick tip
Always encode images as base64 and specify the correct MIME type when sending images to Gemini API.
Common mistake
Beginners often forget to encode the image in base64 or omit the MIME type, causing the API to reject the input.