API Beginner easy · 4 min

Gemini model family: Flash, Pro, Ultra

What you will learn

Choose between Gemini Flash (fast, cheap), Pro (balanced), and Ultra (most capable) based on latency and accuracy requirements for your task.

Why this matters

Model selection is your first decision in every Gemini API call: picking the wrong model wastes money, adds latency, or fails at complex reasoning tasks. Understanding the tradeoffs prevents production incidents.

Skip if: Use a different provider's API if you need multimodal input that Gemini doesn't support (though Gemini 2.0 Flash now handles image, video, and audio). Use local models if your latency budget is <100ms or you cannot send data to Google's servers.

Explanation

The Gemini model family offers three tiers, each optimized for different tradeoffs: Flash prioritizes speed and cost (ideal for chat, summarization, classification), Pro balances capability with latency (general-purpose tasks), and Ultra delivers maximum reasoning power (complex analysis, code generation, multi-step problems).

Under the hood: These are different model architectures with different parameter counts and training objectives. Flash uses a lightweight architecture trained for throughput; Pro is the foundational model; Ultra is a larger, more expensively-trained variant. Google's infrastructure routes requests to regional endpoints, so model choice affects both response time and cost per 1M tokens.

When to use each: Flash for sub-100ms latency requirements and high volume (chatbots, moderation). Pro when you don't know which model you need (it's the safest default). Ultra when you're stuck: if Pro fails at a reasoning task, Ultra will likely succeed. Never use Ultra for simple tasks (waste of ~3x cost).

Request code

python

import os
import google.generativeai as genai

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])

models = {
    'flash': 'gemini-2.0-flash',
    'pro': 'gemini-2.5-pro',
    'ultra': 'gemini-2.5-pro-exp-05-21'
}

test_prompt = 'Explain quantum entanglement in one sentence.'

for model_name, model_id in models.items():
    model = genai.GenerativeModel(model_id)
    response = model.generate_content(test_prompt)
    print(f'{model_name.upper()}: {response.text[:80]}...')
    print(f'  Stop reason: {response.candidates[0].finish_reason}')
    print()

Authentication

Set your Google API key before importing. Create a key at https://aistudio.google.com/app/apikeys. Then: ```bash export GOOGLE_API_KEY='your-key-here' ``` The google-generativeai SDK reads this environment variable automatically during genai.configure().

Response shape

Field	Description
`text`	The generated text response
`candidates`	List of response alternatives (usually 1 item)
`candidates[0].finish_reason`	STOP (completed) or MAX_TOKENS (hit limit)
`candidates[0].content.parts[0].text`	Alternative way to access the text
`usage_metadata`	Token counts for input and output

Field guide

finish_reason

STOP means normal completion; MAX_TOKENS means the model ran out of output length budget: increase max_output_tokens if this happens

usage_metadata

Input and output token counts matter for billing: multiply by model-specific rates (Flash is cheapest, Ultra is ~3x more)

candidates

Multiple candidates exist only if you set candidate_count > 1; beginners should leave this at 1 (default)

Setup trap

If you set GOOGLE_API_KEY after calling genai.configure() with no argument, the SDK won't pick it up. Call genai.configure(api_key=os.environ['GOOGLE_API_KEY']) after setting the environment variable, or configure before your main logic runs. The SDK does NOT re-read the env var on every request.

Cost

Flash: $0.075 per 1M input tokens, $0.30 per 1M output tokens. Pro: $1.50 per 1M input, $6.00 per 1M output. Ultra: $3.00 per 1M input, $12.00 per 1M output. A 1000-token request to Ultra costs ~$0.015; Flash costs ~$0.000075. Using Ultra for simple classification queries multiplies your bill by 40x.

Rate limits

Flash handles higher throughput (1500 requests/minute free tier) than Ultra (100 requests/minute free tier). If you're load-testing or running batch jobs, Flash won't throttle you as quickly.

Common gotcha

Specifying the wrong model ID (e.g., 'gemini-pro' or 'gemini-1.5-pro') will throw a 400 error. Google renamed models in late 2024: the current names are gemini-2.0-flash, gemini-2.5-pro, and gemini-2.5-pro-exp-05-21. Copy-pasting from old tutorials fails.

Error recovery

ValueError: Invalid model: gemini-pro

Model name is outdated. Use 'gemini-2.0-flash' instead. Check https://ai.google.dev/models for current names.

google.api_core.exceptions.InvalidArgument: 400 Invalid API key

GOOGLE_API_KEY env var is missing or malformed. Verify it exists: `echo $GOOGLE_API_KEY` in bash. Restart your Python process after setting it.

google.api_core.exceptions.ResourceExhausted: 429 Resource exhausted

Rate limit hit. Implement exponential backoff (wait 1s, then 2s, then 4s) or use Flash instead of Ultra to stay under limits.

Experienced dev note

Start every project with Flash and only upgrade to Pro or Ultra if Flash fails (timeout, reasoning error, or output quality unacceptable). This saves 95% of API costs for 80% of tasks. Additionally: model names change quarterly: don't hardcode them. Query genai.list_models() once at startup and cache the actual available model IDs, then fail explicitly if your target model disappears.

Check your understanding

If you're building a high-volume chatbot that needs <200ms latency and processes 10M messages/month, which model should you choose and why? What would happen to your budget if you picked the 'safest' option instead?

Show answer hint

Flash has the lowest latency and cost; Pro and Ultra add latency and multiply costs. The 'safest default' Pro would cost 20x more for the same task. Match model to constraint (latency or reasoning complexity), not to safety.

VERSION As of April 2026, gemini-pro and gemini-1.5-pro are deprecated: use gemini-2.0-flash or gemini-2.5-pro. Ultra models are experimental and may require different endpoints. Check google.ai.google.dev/models for the latest stable names.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.