Cheat Sheet intermediate · 8 min read

Google Gemini Cheat Sheet — Models, APIs & Patterns

version 2.5-pro (latest)

Google's frontier AI models for text, vision, and reasoning

GOOGLE_API_KEY

install pip install google-generativeai

core imports

python

from google.generativeai import GenerativeModel
import google.generativeai as genai

Mental model

Google's family of multimodal LLMs with vision, reasoning, and real-time capabilities.

Like smartphone tiers: Nano is the budget phone (fast, battery-efficient), Flash is the standard (balanced), Pro is the powerhouse (reasoning). You don't add a camera: it ships with one.

Key Concepts

Gemini 2.5 Pro

Flagship reasoning model with 2M context, vision, audio support, and tool use for complex multi-step tasks.

Gemini 2.0 Flash

Fast, efficient multimodal model optimized for real-time applications, vision understanding, and tool use with lower latency.

Gemini 1.5 Pro

Previous-generation flagship with 2M context tokens, strong at long-document analysis and complex reasoning.

Gemini 1.5 Flash

Lightweight model optimized for speed and cost-efficiency, supports 1M context tokens.

Gemini Nano

Ultra-small models (1B, 3B parameters) deployed on-device for privacy-sensitive applications with minimal latency.

Context Window

Maximum tokens of input history: Gemini Pro supports 2M (200K pages), enabling truly long-document RAG without chunking.

Native Multimodality

Images, video, audio, and text processed natively in a single API call: no separate embedding or preprocessing pipeline.

Tool Use / Function Calling

Model can request specific tools (APIs, code) to execute: structured output enabling agentic workflows.

When to Use Each Model

Model	Best For	Context	Speed	Cost
Gemini 2.5 Pro	Complex reasoning, long documents, multimodal analysis, agentic tasks	2M tokens	Slow (30-60s)	$$$
Gemini 2.0 Flash	Real-time chat, vision analysis, quick classification, high volume	1M tokens	Fast (2-5s)	$$
Gemini 1.5 Flash	Cost-sensitive, batch processing, simple Q&A, high throughput	1M tokens	Very fast (1-2s)	$
Gemini Nano	On-device, privacy-critical, latency-sensitive (phone/edge)	8K tokens	Instant (<100ms)	Free (on-device)

Google Gemini Patterns

01 Text generation with streaming

Simple Q&A, summarization, content generation

python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.5-pro')

response = model.generate_content(
    'Explain quantum computing in 100 words',
    stream=True
)

for chunk in response:
    print(chunk.text, end='', flush=True)

output Quantum computing leverages quantum mechanics principles...

Streaming chunks may arrive out-of-order; always concatenate in sequence. Missing `stream=True` blocks until full response: use for real-time UX.

02 Image and video understanding

Document analysis, screenshot parsing, video transcription

python

from PIL import Image
import google.generativeai as genai
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')

# Image from file
img = Image.open('document.png')
response = model.generate_content(
    ['Extract all text and tables from this image', img]
)
print(response.text)

# Image from URL
response = model.generate_content(
    ['Describe this image', {'mime_type': 'image/jpeg', 'data': open('photo.jpg', 'rb').read()}]
)
print(response.text)

output The image contains a table with columns: Name, Age, Location...

PIL Images must be RGB or RGBA. TIFF/WebP fail silently. Video calls only work with `gemini-2.0-flash` or newer: 1.5 models reject video.

03 Long-context RAG (no chunking needed)

200+ page PDFs, entire codebase context, legal documents

python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-1.5-pro')

# Load entire PDF as text (PDFPlumber, PyPDF2, etc.)
with open('contract.txt', 'r') as f:
    full_document = f.read()  # Can be 100K+ tokens

response = model.generate_content([
    f'Analyze this legal document:\n{full_document}',
    'List all liability clauses and their penalties. Highlight conflicts.'
])

print(response.text)

output Liability Clause 1 (Section 4.2): Vendor liable up to $100K...

2M context is billed at 4x the token rate of 128K context. Don't load Wikipedia. System prompts still count toward input tokens. No automatic RAG: manage chunks yourself for cost control.

04 Function calling for agentic workflows

Multi-step tasks, API orchestration, structured outputs

python

import google.generativeai as genai
from google.generativeai.types import tool
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.5-pro')

# Define tools
tools = [
    tool.Tool(
        function_declarations=[
            {
                'name': 'get_weather',
                'description': 'Get current weather for a location',
                'parameters': {
                    'type': 'object',
                    'properties': {
                        'location': {'type': 'string'}
                    },
                    'required': ['location']
                }
            }
        ]
    )
]

response = model.generate_content(
    'What is the weather in San Francisco?',
    tools=tools
)

if response.tool_calls:
    for tool_call in response.tool_calls:
        print(f'Calling {tool_call.name} with {tool_call.args}')

output Calling get_weather with {'location': 'San Francisco'}

Tool use requires explicit tool schema definition. Model may invent tool calls: validate against your actual tools. Must manually execute tools and feed results back to model for multi-turn agentic loops.

05 Content safety settings

Production deployments, regulated industries, user-facing apps

python

import google.generativeai as genai
from google.generativeai.types import HarmCategory, HarmBlockThreshold
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')

response = model.generate_content(
    'Generate a story',
    safety_settings={
        HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
        HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
        HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
        HarmCategory.HARM_CATEGORY_SEXUAL_CONTENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    }
)

if response.prompt_feedback.block_reason:
    print(f'Blocked: {response.prompt_feedback.block_reason}')

output Blocked: SAFETY

BLOCK_NONE still filters some content server-side: you can't fully disable safety. BLOCK_LOW_AND_ABOVE catches more; use judiciously. Blocked responses return empty text, not errors.

06 Multi-turn chat with history

Chatbots, conversational AI, stateful assistants

python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')

chat = model.start_chat(history=[])

# Turn 1
response1 = chat.send_message('My name is Alice')
print('Bot:', response1.text)

# Turn 2
response2 = chat.send_message('What is my name?')
print('Bot:', response2.text)

# View full history
for msg in chat.history:
    print(f"{msg.role}: {msg.parts[0].text}")

output

Bot: Nice to meet you, Alice!
Bot: Your name is Alice.

History persists in memory only: restart script loses it. No server-side session management. Each message includes full history (use token_count to track). Manually manage history for cost optimization in long conversations.

generate_content() and GenerativeModel Options

Core API Parameters

Parameter	Type	Default	Notes
`model`	string	gemini-2.0-flash	Use 'gemini-2.5-pro', 'gemini-2.0-flash', 'gemini-1.5-pro', or 'gemini-1.5-flash'
`temperature`	float	1.0	0.0 (deterministic) to 2.0 (creative). Use 0 for factual, 1.5+ for brainstorming
`top_p`	float	0.95	Nucleus sampling. Lower = focused. Keep with temperature for best results
`top_k`	int	40	Consider top K most likely tokens. Ignored if top_p set
`max_output_tokens`	int	None (model default)	Limits response length. Useful for cost control. Model may cut mid-sentence
`system_prompt`	string	None	System instructions. Not officially supported: use first message instead
`stream`	bool	False	Enable streaming. Returns iterator of response chunks
`tools`	list	[]	Function definitions for tool use. Must match tool.Tool() schema

Common Errors & Fixes

01 google.api_core.exceptions.InvalidArgument: 400 Invalid API Key

Cause: GOOGLE_API_KEY not set, wrong key, or API not enabled in Google Cloud Console.

Fix:

python

export GOOGLE_API_KEY='your-key-here'
# Then in Python:
import os
import google.generativeai as genai
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])

02 google.api_core.exceptions.ResourceExhausted: 429 Rate limit exceeded

Cause: Hitting quota. Free tier: 60 requests/min, 1.5M tokens/day. Paid tier: higher but has per-region limits.

Fix:

python

import time
import google.generativeai as genai

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')

for i in range(10):
    try:
        response = model.generate_content('Hello')
        print(response.text)
    except Exception as e:
        if '429' in str(e):
            time.sleep(2)  # Back off and retry
        else:
            raise

03 AttributeError: 'NoneType' object has no attribute 'text'

Cause: Response is None (blocked by safety filters) or generation failed silently.

Fix:

python

import google.generativeai as genai

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')

response = model.generate_content('Your prompt')

if response is None or not response.candidates:
    print('Generation blocked or failed')
    print(f'Block reason: {response.prompt_feedback.block_reason if response else "Unknown"}')
else:
    print(response.text)

04 Image format not supported (TIFF, WebP silently fail)

Cause: Gemini accepts JPEG, PNG, GIF, WebP: but TIFF fails without error.

Fix:

python

from PIL import Image
import google.generativeai as genai

# Convert TIFF to PNG first
img = Image.open('document.tiff')
img_rgb = img.convert('RGB')
img_rgb.save('document.png')

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')
response = model.generate_content(['Analyze:', Image.open('document.png')])
print(response.text)

05 Model 'gemini-pro' not found / 404

Cause: Using old model names. 'gemini-pro' and 'gemini-pro-vision' were deprecated mid-2024.

Fix:

python

import google.generativeai as genai

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])

# ✅ Use current models:
model = genai.GenerativeModel('gemini-2.5-pro')      # Reasoning
model = genai.GenerativeModel('gemini-2.0-flash')    # Fast
model = genai.GenerativeModel('gemini-1.5-pro')      # Long context
model = genai.GenerativeModel('gemini-1.5-flash')    # Budget

# List available models:
for m in genai.list_models():
    print(m.name)

Production Gotchas

⚠ Long context ≠ automatic RAG. You pay per token.

Gemini's 2M context is tempting but billing is 4x the 128K rate. Dumping 100K tokens costs ~$1.50 per request at pro rates. Strategy: use 1.5-flash for simple retrieval (cheaper), reserve pro for truly complex reasoning. Implement your own hybrid chunking + retrieval.

⚠ System prompts don't exist in the official API.

Google removed system_prompt support. Workaround: prepend system instructions to user's first message or bake them into the request context. This counts toward input tokens. CrewAI/LangChain wrappers may fake it via stored context.

⚠ Streaming chunks arrive out-of-order; buffering required.

With stream=True, chunks may arrive in race condition. Always collect the full response before processing: don't assume first chunk = complete thought. Use response.text after iteration finishes.

⚠ Blocked responses return empty candidates[], not errors.

Safety filters silently produce `response.candidates = []`. Your code must check `if response.candidates:` before accessing `.text`. No exception raised: app appears to hang or return null.

⚠ Tool use requires manual loop implementation.

Calling model.generate_content() with tools returns tool_calls: you must execute them and call send_message() again with results. No automatic agent loop. Multi-step reasoning requires 3+ API calls.

⚠ Free tier quota resets daily, not hourly.

60 req/min, 1.5M tokens/day. After hitting 1.5M tokens, you're blocked until midnight UTC. No carryover. Production apps need paid API key immediately.

⚠ Vision models see the image, but may hallucinate details.

Gemini 2.0-flash does OCR well but invents text not in images (especially with small fonts). Always validate extracted data against source. 'Describe this' ≠ 'extract every word exactly.'

⚠ Context window does NOT reduce hallucinations.

Larger context ≠ more accurate. Gemini can still invent facts even with 2M context. Use retrieval-augmented generation (RAG) properly: feed facts as context, use lower temperature for factual tasks.

Verified 2026-04 · vgemini-2.5-pro (latest) · gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-pro

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.