candidates: multiple completions
Why this matters
When you need multiple creative options, diverse approaches, or want to compare different response qualities without making multiple API calls, <code>candidate_count</code> lets you generate 1-8 responses in parallel for the cost of one request.
Explanation
What it does: The candidate_count parameter in generate_content() tells Gemini to generate multiple independent response variants and return all of them in the candidates list. You specify a number (1–8), and the API returns that many complete responses to the same prompt. How it works: Each candidate is generated independently by the model with the same temperature and system instructions. The API processes them in parallel within a single request, then returns all of them together. The finish_reason field on each candidate tells you why that specific response stopped (e.g., STOP for natural completion, MAX_TOKENS if it hit the limit). When to use it: Use this when brainstorming (get multiple creative ideas), evaluating model behavior (compare outputs), A/B testing responses, or when downstream systems need choice. Avoid it for cost-sensitive, latency-critical, or deterministic tasks where you only want one answer.
Request code
import os
import google.generativeai as genai
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')
response = model.generate_content(
'Write a short sci-fi story premise in 2 sentences.',
generation_config=genai.types.GenerationConfig(
candidate_count=3,
temperature=0.9,
max_output_tokens=150
)
)
print(f'Number of candidates: {len(response.candidates)}')
for i, candidate in enumerate(response.candidates):
print(f'\n--- Candidate {i+1} ---')
print(candidate.content.parts[0].text)
print(f'Finish reason: {candidate.finish_reason}') Authentication
Set your API key before running:
```bash
export GOOGLE_API_KEY="your-actual-api-key"
```
Obtain it from Google AI Studio (https://aistudio.google.com/apikey). The genai.configure() call reads this environment variable automatically.
Response shape
| Field | Description |
|---|---|
candidates | List of GenerateContentResponse objects, one per requested candidate |
candidates[0].content | The generated text/content for this candidate |
candidates[0].content.parts | List of content parts (usually one text part) |
candidates[0].content.parts[0].text | The actual generated text string |
candidates[0].finish_reason | Enum indicating why generation stopped: STOP (natural), MAX_TOKENS (limit hit), SAFETY (content filtered), RECITATION (blocked as recitation) |
candidates[0].safety_ratings | List of safety assessment objects with category and probability |
Field guide
candidates Always check the length of this list: it may be fewer than requested if the API encountered errors or safety issues
finish_reason Developers miss this: if finish_reason is MAX_TOKENS, the response was truncated. Increase max_output_tokens or accept incomplete answers
safety_ratings Each candidate has independent safety evaluations. A candidate might be blocked even if others pass, which is why you get fewer candidates than requested
Setup trap
The candidate_count parameter must live inside generation_config, not directly in generate_content(). Passing it as generate_content(prompt, candidate_count=3) silently fails: the parameter is ignored. Wrap it in GenerationConfig() or the call succeeds but you get only one candidate.
Cost
Requesting 3 candidates costs the same as 1 request (charged once for input tokens). Output tokens are summed across all candidates, so 3 responses of 100 tokens each costs 300 output tokens. This is more efficient than 3 separate API calls, which would each incur a separate request cost.
Common gotcha
Requesting candidate_count=3 does not guarantee you receive 3 candidates. If any candidate triggers safety filtering, that candidate is dropped from the response. You may receive 0–3 results. Always check len(response.candidates) before indexing.
Error recovery
ValueError: Invalid candidate_countlen(response.candidates) == 0response.candidates[i] has no .content attributeExperienced dev note
In production, treat candidate_count as a request, not a guarantee. Build fallback logic: if you ask for 3 candidates and get 1, your downstream system must handle that gracefully. For cost optimization, request multiple candidates once instead of looping with separate calls: one candidate_count=5 request beats five candidate_count=1 calls. Also: higher temperature + multiple candidates is powerful for brainstorming, but ensure you have a selection strategy (e.g., heuristic scoring, human review) or you'll end up using the first one anyway.
Check your understanding
You request 5 candidates with temperature=0.8, and the API returns only 2 in the response. Your teammate suggests rerunning with temperature=0.1 to get more candidates. Will this fix the problem, and why or why not?
Show answer hint
Temperature affects diversity, not candidate filtering. The missing 3 candidates were likely dropped due to safety violations, not randomness. Lower temperature would make responses more similar to each other, not bypass safety filters. Investigate the 2 returned candidates' <code>safety_ratings</code> and adjust the prompt instead.
candidate_count parameter is available in all gemini-2.0 and gemini-1.5 models. Earlier 1.0 models have inconsistent support; upgrade to 0.8.x if using older versions.