Gemini model family: Flash, Pro, Ultra
Why this matters
Model selection is your first decision in every Gemini API call: picking the wrong model wastes money, adds latency, or fails at complex reasoning tasks. Understanding the tradeoffs prevents production incidents.
Explanation
The Gemini model family offers three tiers, each optimized for different tradeoffs: Flash prioritizes speed and cost (ideal for chat, summarization, classification), Pro balances capability with latency (general-purpose tasks), and Ultra delivers maximum reasoning power (complex analysis, code generation, multi-step problems).
Under the hood: These are different model architectures with different parameter counts and training objectives. Flash uses a lightweight architecture trained for throughput; Pro is the foundational model; Ultra is a larger, more expensively-trained variant. Google's infrastructure routes requests to regional endpoints, so model choice affects both response time and cost per 1M tokens.
When to use each: Flash for sub-100ms latency requirements and high volume (chatbots, moderation). Pro when you don't know which model you need (it's the safest default). Ultra when you're stuck: if Pro fails at a reasoning task, Ultra will likely succeed. Never use Ultra for simple tasks (waste of ~3x cost).
Request code
import os
import google.generativeai as genai
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
models = {
'flash': 'gemini-2.0-flash',
'pro': 'gemini-2.5-pro',
'ultra': 'gemini-2.5-pro-exp-05-21'
}
test_prompt = 'Explain quantum entanglement in one sentence.'
for model_name, model_id in models.items():
model = genai.GenerativeModel(model_id)
response = model.generate_content(test_prompt)
print(f'{model_name.upper()}: {response.text[:80]}...')
print(f' Stop reason: {response.candidates[0].finish_reason}')
print() Authentication
Set your Google API key before importing. Create a key at https://aistudio.google.com/app/apikeys. Then: ```bash export GOOGLE_API_KEY='your-key-here' ``` The google-generativeai SDK reads this environment variable automatically during genai.configure().
Response shape
| Field | Description |
|---|---|
text | The generated text response |
candidates | List of response alternatives (usually 1 item) |
candidates[0].finish_reason | STOP (completed) or MAX_TOKENS (hit limit) |
candidates[0].content.parts[0].text | Alternative way to access the text |
usage_metadata | Token counts for input and output |
Field guide
finish_reason STOP means normal completion; MAX_TOKENS means the model ran out of output length budget: increase max_output_tokens if this happens
usage_metadata Input and output token counts matter for billing: multiply by model-specific rates (Flash is cheapest, Ultra is ~3x more)
candidates Multiple candidates exist only if you set candidate_count > 1; beginners should leave this at 1 (default)
Setup trap
If you set GOOGLE_API_KEY after calling genai.configure() with no argument, the SDK won't pick it up. Call genai.configure(api_key=os.environ['GOOGLE_API_KEY']) after setting the environment variable, or configure before your main logic runs. The SDK does NOT re-read the env var on every request.
Cost
Flash: $0.075 per 1M input tokens, $0.30 per 1M output tokens. Pro: $1.50 per 1M input, $6.00 per 1M output. Ultra: $3.00 per 1M input, $12.00 per 1M output. A 1000-token request to Ultra costs ~$0.015; Flash costs ~$0.000075. Using Ultra for simple classification queries multiplies your bill by 40x.
Rate limits
Flash handles higher throughput (1500 requests/minute free tier) than Ultra (100 requests/minute free tier). If you're load-testing or running batch jobs, Flash won't throttle you as quickly.
Common gotcha
Specifying the wrong model ID (e.g., 'gemini-pro' or 'gemini-1.5-pro') will throw a 400 error. Google renamed models in late 2024: the current names are gemini-2.0-flash, gemini-2.5-pro, and gemini-2.5-pro-exp-05-21. Copy-pasting from old tutorials fails.
Error recovery
ValueError: Invalid model: gemini-progoogle.api_core.exceptions.InvalidArgument: 400 Invalid API keygoogle.api_core.exceptions.ResourceExhausted: 429 Resource exhaustedExperienced dev note
Start every project with Flash and only upgrade to Pro or Ultra if Flash fails (timeout, reasoning error, or output quality unacceptable). This saves 95% of API costs for 80% of tasks. Additionally: model names change quarterly: don't hardcode them. Query genai.list_models() once at startup and cache the actual available model IDs, then fail explicitly if your target model disappears.
Check your understanding
If you're building a high-volume chatbot that needs <200ms latency and processes 10M messages/month, which model should you choose and why? What would happen to your budget if you picked the 'safest' option instead?
Show answer hint
Flash has the lowest latency and cost; Pro and Ultra add latency and multiply costs. The 'safest default' Pro would cost 20x more for the same task. Match model to constraint (latency or reasoning complexity), not to safety.