API Beginner easy · 5 min

candidates: multiple completions

What you will learn
Use the <code>candidate_count</code> parameter to request multiple independent response variants from Gemini in a single API call.

Why this matters

When you need multiple creative options, diverse approaches, or want to compare different response qualities without making multiple API calls, <code>candidate_count</code> lets you generate 1-8 responses in parallel for the cost of one request.

Skip if: Do not use <code>candidate_count</code> if you need to continue a conversation from a specific response: each candidate is independent. Also skip it if you need deterministic, single-answer behavior (like fact-based Q&A where all candidates should be identical). For sequential follow-ups, make separate API calls instead.

Explanation

What it does: The candidate_count parameter in generate_content() tells Gemini to generate multiple independent response variants and return all of them in the candidates list. You specify a number (1–8), and the API returns that many complete responses to the same prompt. How it works: Each candidate is generated independently by the model with the same temperature and system instructions. The API processes them in parallel within a single request, then returns all of them together. The finish_reason field on each candidate tells you why that specific response stopped (e.g., STOP for natural completion, MAX_TOKENS if it hit the limit). When to use it: Use this when brainstorming (get multiple creative ideas), evaluating model behavior (compare outputs), A/B testing responses, or when downstream systems need choice. Avoid it for cost-sensitive, latency-critical, or deterministic tasks where you only want one answer.

Request code

python
import os
import google.generativeai as genai

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])

model = genai.GenerativeModel('gemini-2.0-flash')

response = model.generate_content(
    'Write a short sci-fi story premise in 2 sentences.',
    generation_config=genai.types.GenerationConfig(
        candidate_count=3,
        temperature=0.9,
        max_output_tokens=150
    )
)

print(f'Number of candidates: {len(response.candidates)}')
for i, candidate in enumerate(response.candidates):
    print(f'\n--- Candidate {i+1} ---')
    print(candidate.content.parts[0].text)
    print(f'Finish reason: {candidate.finish_reason}')

Authentication

Set your API key before running: ```bash export GOOGLE_API_KEY="your-actual-api-key" ``` Obtain it from Google AI Studio (https://aistudio.google.com/apikey). The genai.configure() call reads this environment variable automatically.

Response shape

FieldDescription
candidates List of GenerateContentResponse objects, one per requested candidate
candidates[0].content The generated text/content for this candidate
candidates[0].content.parts List of content parts (usually one text part)
candidates[0].content.parts[0].text The actual generated text string
candidates[0].finish_reason Enum indicating why generation stopped: STOP (natural), MAX_TOKENS (limit hit), SAFETY (content filtered), RECITATION (blocked as recitation)
candidates[0].safety_ratings List of safety assessment objects with category and probability

Field guide

candidates

Always check the length of this list: it may be fewer than requested if the API encountered errors or safety issues

finish_reason

Developers miss this: if finish_reason is MAX_TOKENS, the response was truncated. Increase max_output_tokens or accept incomplete answers

safety_ratings

Each candidate has independent safety evaluations. A candidate might be blocked even if others pass, which is why you get fewer candidates than requested

Setup trap

The candidate_count parameter must live inside generation_config, not directly in generate_content(). Passing it as generate_content(prompt, candidate_count=3) silently fails: the parameter is ignored. Wrap it in GenerationConfig() or the call succeeds but you get only one candidate.

Cost

Requesting 3 candidates costs the same as 1 request (charged once for input tokens). Output tokens are summed across all candidates, so 3 responses of 100 tokens each costs 300 output tokens. This is more efficient than 3 separate API calls, which would each incur a separate request cost.

Common gotcha

Requesting candidate_count=3 does not guarantee you receive 3 candidates. If any candidate triggers safety filtering, that candidate is dropped from the response. You may receive 0–3 results. Always check len(response.candidates) before indexing.

Error recovery

ValueError: Invalid candidate_count
You requested >8 candidates. Maximum is 8. Reduce to 8 or lower.
len(response.candidates) == 0
All candidates were filtered by safety. Lower temperature or remove controversial phrasing from the prompt, then retry.
response.candidates[i] has no .content attribute
The candidate was rejected before generation. Check safety_ratings or finish_reason. You may need to increase safety_settings to allow more responses.

Experienced dev note

In production, treat candidate_count as a request, not a guarantee. Build fallback logic: if you ask for 3 candidates and get 1, your downstream system must handle that gracefully. For cost optimization, request multiple candidates once instead of looping with separate calls: one candidate_count=5 request beats five candidate_count=1 calls. Also: higher temperature + multiple candidates is powerful for brainstorming, but ensure you have a selection strategy (e.g., heuristic scoring, human review) or you'll end up using the first one anyway.

Check your understanding

You request 5 candidates with temperature=0.8, and the API returns only 2 in the response. Your teammate suggests rerunning with temperature=0.1 to get more candidates. Will this fix the problem, and why or why not?

Show answer hint

Temperature affects diversity, not candidate filtering. The missing 3 candidates were likely dropped due to safety violations, not randomness. Lower temperature would make responses more similar to each other, not bypass safety filters. Investigate the 2 returned candidates' <code>safety_ratings</code> and adjust the prompt instead.

VERSION google-generativeai 0.8.x stable. The candidate_count parameter is available in all gemini-2.0 and gemini-1.5 models. Earlier 1.0 models have inconsistent support; upgrade to 0.8.x if using older versions.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.