API Beginner easy · 5 min

generation_config: controlling output

What you will learn

Use <code>generation_config</code> to control temperature, token limits, and safety thresholds when calling the Gemini API.

Why this matters

By default, Gemini generates creative, variable responses. In production, you need deterministic outputs for reliability, cost control, and safety compliance. <code>generation_config</code> is how you enforce those constraints without changing your prompt.

Skip if: Skip <code>generation_config</code> entirely if you're prototyping quickly or building chatbots where variety is intentional. Don't use it for complex structured output: use <code>response_schema</code> instead.

Explanation

What it does: generation_config is a dictionary parameter passed to generate_content() that controls how the model generates text. It sets temperature (randomness), max output tokens (length), safety thresholds, and other generation parameters.

How it works: When you pass generation_config to the API request, Gemini's backend enforces these constraints before generating tokens. Temperature modifies the probability distribution of each token choice: lower values (0.0–0.5) make outputs deterministic and factual; higher values (0.7–2.0) increase creativity and variation. Token limits hard-cap the response length, preventing runaway costs. Safety thresholds filter content by category (harassment, hate speech, sexual content, dangerous info) at specified blocking levels.

When to use it: Always include generation_config in production code. Use temperature=0.0 or 0.1 for factual tasks (summarization, extraction, classification). Use temperature=0.7–1.0 for creative tasks (brainstorming, storytelling). Always set max_output_tokens to prevent surprise bills.

Request code

python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')

response = model.generate_content(
    'Classify this sentiment: "The product broke after one week."',
    generation_config=genai.types.GenerationConfig(
        temperature=0.0,
        max_output_tokens=100,
        top_p=0.9,
        top_k=40,
        safety_settings=[
            {
                'category': genai.types.HarmCategory.HARM_CATEGORY_HARASSMENT,
                'threshold': genai.types.HarmBlockThreshold.BLOCK_NONE
            },
            {
                'category': genai.types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
                'threshold': genai.types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
            }
        ]
    )
)

print(response.text)

Authentication

Set GOOGLE_API_KEY environment variable before running: export GOOGLE_API_KEY='your-api-key-here'. Get your key from https://aistudio.google.com/app/apikey.

Response shape

Field	Description
`text`	string: the generated response text
`finish_reason`	string: why generation stopped ('STOP', 'MAX_TOKENS', 'SAFETY')
`safety_ratings`	list of dicts with category and probability of harmful content

Field guide

text

The actual generated output you care about: use this for business logic

finish_reason

Critical for debugging: if it's 'MAX_TOKENS', your response was truncated; if 'SAFETY', content was blocked and text may be empty or partial

safety_ratings

Often ignored but essential for compliance: shows what categories the model flagged, even if output wasn't blocked

Setup trap

Forgetting to set GOOGLE_API_KEY before calling genai.configure() causes google.api_core.exceptions.InvalidArgument: 400 API key not valid for use with this API. The error doesn't say 'key not found': it looks like a permissions issue. Always verify the key is set: print(os.environ.get('GOOGLE_API_KEY')) before configuring.

Cost

Each token in your prompt + response costs money (pricing varies by model). <code>max_output_tokens=10000</code> on a popular model can cost $0.03–0.10 per call. Set conservative limits: <code>max_output_tokens=500</code> for classification, <code>2000</code> for summaries, <code>4000</code> for detailed analysis.

Rate limits

Free tier: 60 requests per minute. Production accounts: 10,000 requests per minute by default. If hitting limits, batch requests or add exponential backoff retry logic. No cost impact, just 429 errors.

Common gotcha

Setting temperature=0.0 does NOT guarantee identical outputs across multiple calls with the same prompt. Gemini still has inherent variance. If you need exact reproducibility, you need to either prompt-engineer for more constrained outputs or post-process the response.

Error recovery

google.api_core.exceptions.InvalidArgument

Usually API key misconfiguration. Verify key exists: <code>assert os.environ['GOOGLE_API_KEY']</code>. Regenerate key at https://aistudio.google.com/app/apikey if it's over 90 days old.

google.api_core.exceptions.ResourceExhausted

Hit rate limit (60 req/min free tier). Add exponential backoff: <code>import time; time.sleep(2 ** retry_count)</code>

ValueError when passing safety_settings

Ensure you're using the correct enum values: <code>genai.types.HarmBlockThreshold.BLOCK_NONE</code>, not strings like <code>'BLOCK_NONE'</code>

Experienced dev note

In production, always pair generation_config with response validation. Just because finish_reason='STOP' doesn't mean the output is useful: a model can generate syntactically valid but semantically broken JSON if you're not careful. Check safety_ratings before showing user-facing content: a high flagged probability in HARM_CATEGORY_HATE_SPEECH should trigger a fallback response, even if the content wasn't hard-blocked. Also: top_p and top_k interact with temperature in non-obvious ways. If your outputs are still too variable at temperature=0.2, lower top_p to 0.5 before increasing temperature again.

Check your understanding

You're building a factual extraction pipeline (pulling structured data from documents). Your model keeps returning slightly different field values on identical input. Which generation_config setting should you adjust first, and why won't temperature=0.0 alone fix this?

Show answer hint

Temperature controls randomness, but Gemini's sampling is still non-deterministic at any temperature. The real fix is either: (1) use <code>top_p=0.1</code> to drastically reduce the token choice space, or (2) switch to response validation to normalize outputs after generation. Temperature alone is insufficient for this use case.

VERSION google-generativeai 0.8.x uses genai.types.GenerationConfig() as a dataclass. In 0.7.x, it was a dict-only pattern. Always instantiate as shown, not as a plain dict, to catch parameter errors early.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.