Cheat Sheet intermediate · 8 min read

Google Gemini Cheat Sheet — Models, APIs & Patterns

version 2.5-pro (latest)

Google's frontier AI models for text, vision, and reasoning

GOOGLE_API_KEY
install pip install google-generativeai
core imports
python
from google.generativeai import GenerativeModel
import google.generativeai as genai
Mental model

Google's family of multimodal LLMs with vision, reasoning, and real-time capabilities.

Like smartphone tiers: Nano is the budget phone (fast, battery-efficient), Flash is the standard (balanced), Pro is the powerhouse (reasoning). You don't add a camera: it ships with one.

Key Concepts

Gemini 2.5 Pro
Flagship reasoning model with 2M context, vision, audio support, and tool use for complex multi-step tasks.
Gemini 2.0 Flash
Fast, efficient multimodal model optimized for real-time applications, vision understanding, and tool use with lower latency.
Gemini 1.5 Pro
Previous-generation flagship with 2M context tokens, strong at long-document analysis and complex reasoning.
Gemini 1.5 Flash
Lightweight model optimized for speed and cost-efficiency, supports 1M context tokens.
Gemini Nano
Ultra-small models (1B, 3B parameters) deployed on-device for privacy-sensitive applications with minimal latency.
Context Window
Maximum tokens of input history: Gemini Pro supports 2M (200K pages), enabling truly long-document RAG without chunking.
Native Multimodality
Images, video, audio, and text processed natively in a single API call: no separate embedding or preprocessing pipeline.
Tool Use / Function Calling
Model can request specific tools (APIs, code) to execute: structured output enabling agentic workflows.

When to Use Each Model

ModelBest ForContextSpeedCost
Gemini 2.5 ProComplex reasoning, long documents, multimodal analysis, agentic tasks2M tokensSlow (30-60s)$$$
Gemini 2.0 FlashReal-time chat, vision analysis, quick classification, high volume1M tokensFast (2-5s)$$
Gemini 1.5 FlashCost-sensitive, batch processing, simple Q&A, high throughput1M tokensVery fast (1-2s)$
Gemini NanoOn-device, privacy-critical, latency-sensitive (phone/edge)8K tokensInstant (<100ms)Free (on-device)

Google Gemini Patterns

01 Text generation with streaming
Simple Q&A, summarization, content generation
python
import google.generativeai as genai
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.5-pro')

response = model.generate_content(
    'Explain quantum computing in 100 words',
    stream=True
)

for chunk in response:
    print(chunk.text, end='', flush=True)
output Quantum computing leverages quantum mechanics principles...
Streaming chunks may arrive out-of-order; always concatenate in sequence. Missing `stream=True` blocks until full response: use for real-time UX.
02 Image and video understanding
Document analysis, screenshot parsing, video transcription
python
from PIL import Image
import google.generativeai as genai
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')

# Image from file
img = Image.open('document.png')
response = model.generate_content(
    ['Extract all text and tables from this image', img]
)
print(response.text)

# Image from URL
response = model.generate_content(
    ['Describe this image', {'mime_type': 'image/jpeg', 'data': open('photo.jpg', 'rb').read()}]
)
print(response.text)
output The image contains a table with columns: Name, Age, Location...
PIL Images must be RGB or RGBA. TIFF/WebP fail silently. Video calls only work with `gemini-2.0-flash` or newer: 1.5 models reject video.
03 Long-context RAG (no chunking needed)
200+ page PDFs, entire codebase context, legal documents
python
import google.generativeai as genai
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-1.5-pro')

# Load entire PDF as text (PDFPlumber, PyPDF2, etc.)
with open('contract.txt', 'r') as f:
    full_document = f.read()  # Can be 100K+ tokens

response = model.generate_content([
    f'Analyze this legal document:\n{full_document}',
    'List all liability clauses and their penalties. Highlight conflicts.'
])

print(response.text)
output Liability Clause 1 (Section 4.2): Vendor liable up to $100K...
2M context is billed at 4x the token rate of 128K context. Don't load Wikipedia. System prompts still count toward input tokens. No automatic RAG: manage chunks yourself for cost control.
04 Function calling for agentic workflows
Multi-step tasks, API orchestration, structured outputs
python
import google.generativeai as genai
from google.generativeai.types import tool
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.5-pro')

# Define tools
tools = [
    tool.Tool(
        function_declarations=[
            {
                'name': 'get_weather',
                'description': 'Get current weather for a location',
                'parameters': {
                    'type': 'object',
                    'properties': {
                        'location': {'type': 'string'}
                    },
                    'required': ['location']
                }
            }
        ]
    )
]

response = model.generate_content(
    'What is the weather in San Francisco?',
    tools=tools
)

if response.tool_calls:
    for tool_call in response.tool_calls:
        print(f'Calling {tool_call.name} with {tool_call.args}')
output Calling get_weather with {'location': 'San Francisco'}
Tool use requires explicit tool schema definition. Model may invent tool calls: validate against your actual tools. Must manually execute tools and feed results back to model for multi-turn agentic loops.
05 Content safety settings
Production deployments, regulated industries, user-facing apps
python
import google.generativeai as genai
from google.generativeai.types import HarmCategory, HarmBlockThreshold
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')

response = model.generate_content(
    'Generate a story',
    safety_settings={
        HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
        HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
        HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
        HarmCategory.HARM_CATEGORY_SEXUAL_CONTENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    }
)

if response.prompt_feedback.block_reason:
    print(f'Blocked: {response.prompt_feedback.block_reason}')
output Blocked: SAFETY
BLOCK_NONE still filters some content server-side: you can't fully disable safety. BLOCK_LOW_AND_ABOVE catches more; use judiciously. Blocked responses return empty text, not errors.
06 Multi-turn chat with history
Chatbots, conversational AI, stateful assistants
python
import google.generativeai as genai
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')

chat = model.start_chat(history=[])

# Turn 1
response1 = chat.send_message('My name is Alice')
print('Bot:', response1.text)

# Turn 2
response2 = chat.send_message('What is my name?')
print('Bot:', response2.text)

# View full history
for msg in chat.history:
    print(f"{msg.role}: {msg.parts[0].text}")
output Bot: Nice to meet you, Alice! Bot: Your name is Alice.
History persists in memory only: restart script loses it. No server-side session management. Each message includes full history (use token_count to track). Manually manage history for cost optimization in long conversations.

generate_content() and GenerativeModel Options

Core API Parameters

ParameterTypeDefaultNotes
model string gemini-2.0-flash Use 'gemini-2.5-pro', 'gemini-2.0-flash', 'gemini-1.5-pro', or 'gemini-1.5-flash'
temperature float 1.0 0.0 (deterministic) to 2.0 (creative). Use 0 for factual, 1.5+ for brainstorming
top_p float 0.95 Nucleus sampling. Lower = focused. Keep with temperature for best results
top_k int 40 Consider top K most likely tokens. Ignored if top_p set
max_output_tokens int None (model default) Limits response length. Useful for cost control. Model may cut mid-sentence
system_prompt string None System instructions. Not officially supported: use first message instead
stream bool False Enable streaming. Returns iterator of response chunks
tools list [] Function definitions for tool use. Must match tool.Tool() schema

Common Errors & Fixes

01 google.api_core.exceptions.InvalidArgument: 400 Invalid API Key

Cause: GOOGLE_API_KEY not set, wrong key, or API not enabled in Google Cloud Console.

Fix:
python
export GOOGLE_API_KEY='your-key-here'
# Then in Python:
import os
import google.generativeai as genai
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
02 google.api_core.exceptions.ResourceExhausted: 429 Rate limit exceeded

Cause: Hitting quota. Free tier: 60 requests/min, 1.5M tokens/day. Paid tier: higher but has per-region limits.

Fix:
python
import time
import google.generativeai as genai

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')

for i in range(10):
    try:
        response = model.generate_content('Hello')
        print(response.text)
    except Exception as e:
        if '429' in str(e):
            time.sleep(2)  # Back off and retry
        else:
            raise
03 AttributeError: 'NoneType' object has no attribute 'text'

Cause: Response is None (blocked by safety filters) or generation failed silently.

Fix:
python
import google.generativeai as genai

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')

response = model.generate_content('Your prompt')

if response is None or not response.candidates:
    print('Generation blocked or failed')
    print(f'Block reason: {response.prompt_feedback.block_reason if response else "Unknown"}')
else:
    print(response.text)
04 Image format not supported (TIFF, WebP silently fail)

Cause: Gemini accepts JPEG, PNG, GIF, WebP: but TIFF fails without error.

Fix:
python
from PIL import Image
import google.generativeai as genai

# Convert TIFF to PNG first
img = Image.open('document.tiff')
img_rgb = img.convert('RGB')
img_rgb.save('document.png')

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')
response = model.generate_content(['Analyze:', Image.open('document.png')])
print(response.text)
05 Model 'gemini-pro' not found / 404

Cause: Using old model names. 'gemini-pro' and 'gemini-pro-vision' were deprecated mid-2024.

Fix:
python
import google.generativeai as genai

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])

# ✅ Use current models:
model = genai.GenerativeModel('gemini-2.5-pro')      # Reasoning
model = genai.GenerativeModel('gemini-2.0-flash')    # Fast
model = genai.GenerativeModel('gemini-1.5-pro')      # Long context
model = genai.GenerativeModel('gemini-1.5-flash')    # Budget

# List available models:
for m in genai.list_models():
    print(m.name)

Production Gotchas

Long context ≠ automatic RAG. You pay per token.

Gemini's 2M context is tempting but billing is 4x the 128K rate. Dumping 100K tokens costs ~$1.50 per request at pro rates. Strategy: use 1.5-flash for simple retrieval (cheaper), reserve pro for truly complex reasoning. Implement your own hybrid chunking + retrieval.

System prompts don't exist in the official API.

Google removed system_prompt support. Workaround: prepend system instructions to user's first message or bake them into the request context. This counts toward input tokens. CrewAI/LangChain wrappers may fake it via stored context.

Streaming chunks arrive out-of-order; buffering required.

With stream=True, chunks may arrive in race condition. Always collect the full response before processing: don't assume first chunk = complete thought. Use response.text after iteration finishes.

Blocked responses return empty candidates[], not errors.

Safety filters silently produce `response.candidates = []`. Your code must check `if response.candidates:` before accessing `.text`. No exception raised: app appears to hang or return null.

Tool use requires manual loop implementation.

Calling model.generate_content() with tools returns tool_calls: you must execute them and call send_message() again with results. No automatic agent loop. Multi-step reasoning requires 3+ API calls.

Free tier quota resets daily, not hourly.

60 req/min, 1.5M tokens/day. After hitting 1.5M tokens, you're blocked until midnight UTC. No carryover. Production apps need paid API key immediately.

Vision models see the image, but may hallucinate details.

Gemini 2.0-flash does OCR well but invents text not in images (especially with small fonts). Always validate extracted data against source. 'Describe this' ≠ 'extract every word exactly.'

Context window does NOT reduce hallucinations.

Larger context ≠ more accurate. Gemini can still invent facts even with 2M context. Use retrieval-augmented generation (RAG) properly: feed facts as context, use lower temperature for factual tasks.

Verified 2026-04 · vgemini-2.5-pro (latest) · gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-pro
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.