API Intermediate medium · 5 min

Accessing execution output

What you will learn
Extract and structure the text, usage metadata, and finish reason from Gemini API responses using the correct response object accessors.

Why this matters

Most developers call generate_content() but don't properly access the response data: they miss usage metrics, finish reasons, or hit AttributeError exceptions by accessing fields incorrectly. Learning the exact response shape prevents silent failures and lets you monitor token usage for cost control.

Skip if: If you only need the raw text and never care about token counts, finish reasons, or streaming chunks, you can simply convert the response to string with str(response). But once you're in production, you'll want usage metadata and safety ratings: so learn the proper accessors now.

Explanation

What it does: The generate_content() method returns a GenerateContentResponse object that contains the model's output text, token usage metrics, finish reason, and safety ratings. You don't get a plain string: you get a structured object with multiple fields.

How it works: When you call model.generate_content('prompt'), the Gemini API returns a response object. The main text is in response.text, but the object also contains response.usage_metadata (input/output token counts), response.candidates[0].finish_reason (why generation stopped), and safety ratings. The response is lazy: .text concatenates all candidate text automatically.

When to use it: Always parse execution output properly in production. You need token counts for cost tracking and billing, finish reasons to detect truncated or blocked responses, and safety ratings to understand content filtering. Accessing these fields lets you log metrics, retry on specific conditions, and debug why a request succeeded or failed.

Request code

python
import google.generativeai as genai
import os
import json

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])

model = genai.GenerativeModel('gemini-2.0-flash')

response = model.generate_content('Explain quantum computing in one sentence.')

print(f'Text output: {response.text}')
print(f'Input tokens: {response.usage_metadata.prompt_token_count}')
print(f'Output tokens: {response.usage_metadata.candidates_token_count}')
print(f'Total tokens: {response.usage_metadata.total_token_count}')
print(f'Finish reason: {response.candidates[0].finish_reason}')

safety_ratings = response.candidates[0].safety_ratings
for rating in safety_ratings:
    print(f'{rating.category.name}: {rating.probability.name}')

Authentication

Set your Google API key before instantiation: ```python import os os.environ['GOOGLE_API_KEY'] = 'your-api-key-here' ``` Then configure the client: ```python import google.generativeai as genai genai.configure(api_key=os.environ['GOOGLE_API_KEY']) ``` The SDK reads the environment variable at configure() time, not at import time.

Response shape

FieldDescription
text String concatenation of all candidate text outputs
usage_metadata [object Object]
candidates [object Object]

Field guide

text

The easiest accessor: returns concatenated text from all parts. Use this for simple output retrieval.

usage_metadata

Critical for production: tracks how many tokens you consumed. Multiply prompt_token_count and candidates_token_count by your pricing tier to calculate per-request cost.

finish_reason

Tells you why generation stopped. STOP = normal completion. MAX_TOKENS = truncated (increase max_output_tokens next time). SAFETY = content policy blocked the output (check safety_ratings to see why).

safety_ratings

The hidden field developers miss: tells you which content categories triggered warnings and at what probability. Ignored during generation but crucial for compliance logging and understanding filtering behavior.

Setup trap

The google-generativeai SDK configures at genai.configure() time, not at import time. If you set os.environ['GOOGLE_API_KEY'] after calling genai.configure(), it will NOT pick up the key. Configure the client after setting the environment variable, not before.

Cost

Each API call incurs charges based on input and output tokens. gemini-2.0-flash costs approximately $0.075 per 1M input tokens and $0.30 per 1M output tokens (April 2026 pricing). Always log <code>response.usage_metadata.total_token_count</code> to track spending per request. A single unbounded generation can cost dollars if max_output_tokens is high.

Rate limits

Google's free tier enforces 15 requests per minute for gemini-2.0-flash. Hitting rate limits returns a 429 error. Implement exponential backoff when you see 429 responses. Paid tier increases to 1,000 RPM.

Common gotcha

Accessing response.text on a response where finish_reason == 'SAFETY' will return an empty string because the model refused to generate. Many developers check only the text and miss the safety block entirely. Always check response.candidates[0].finish_reason before trusting the output.

Error recovery

APIError with status 429
Rate limit hit. Implement exponential backoff: sleep(2^retry_count) and retry up to 3 times.
AttributeError: 'GenerateContentResponse' object has no attribute 'output'
Wrong field name: use response.text, not response.output. The old OpenAI pattern doesn't apply here.
IndexError: list index out of range when accessing candidates[0]
No candidates were generated, likely due to safety filtering. Check response.prompt_feedback.block_reason before accessing candidates.
ValueError: SAFETY finish_reason with empty text
Generation was blocked by policy. Inspect safety_ratings to understand which categories triggered the block, then rephrase your prompt.

Experienced dev note

The biggest win here is usage_metadata: track it religiously in production. Log total_token_count with timestamp and user_id so you can identify who's burning through quota. The second insight: finish_reason is your canary. If you see MAX_TOKENS frequently, increase max_output_tokens. If you see SAFETY frequently, your users are hitting content policy: that's a product signal, not a bug.

Check your understanding

You call generate_content() and get back a response where finish_reason is 'SAFETY' and text is an empty string. What went wrong, and how would you detect this programmatically without manually inspecting the response?

Show answer hint

The model refused to generate because the output violated safety policy. Detect this by checking if response.candidates[0].finish_reason.name == 'SAFETY' before using response.text. Then inspect response.candidates[0].safety_ratings to log which category (harassment, hate speech, etc.) triggered the block.

VERSION google-generativeai 0.8.x uses LCEL-style response objects. Older 0.1.x versions had different accessors (e.g., .result instead of .text). Always pin your version: install google-generativeai==0.8.x to avoid surprise breaking changes.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.