How many tokens can Gemini process
Quick answer
Google's
gemini-1.5-pro and gemini-1.5-flash models support up to 32,000 tokens per request, while gemini-2.0-flash can handle up to 64,000 tokens. These limits include both prompt and completion tokens combined.PREREQUISITES
Python 3.8+Google Cloud API key with access to Gemini modelspip install google-ai-generativelanguage
Setup
Install the official Google Generative Language SDK and set your API key as an environment variable to authenticate requests.
pip install google-ai-generativelanguage Step by step
Use the Google Generative Language SDK to send a chat completion request to a Gemini model, respecting the token limits.
import os
from google.ai import generativelanguage as glm
# Initialize client
client = glm.TextServiceClient()
# Define the model and prompt
model = "gemini-1.5-pro"
prompt = "Explain the theory of relativity in simple terms."
# Create the request
response = client.generate_text(
model=model,
prompt=prompt,
max_tokens=1000 # well below the 32k token limit
)
print(response.candidates[0].output) output
The theory of relativity, developed by Albert Einstein, explains how space and time are linked for objects moving at a consistent speed in a straight line...
Common variations
You can use different Gemini models with varying token limits, such as gemini-1.5-flash (32k tokens) or gemini-2.0-flash (64k tokens). Adjust max_tokens accordingly to avoid exceeding limits.
response = client.generate_text(
model="gemini-2.0-flash",
prompt=prompt,
max_tokens=2000 # within 64k token limit
)
print(response.candidates[0].output) output
The theory of relativity revolutionized physics by introducing the concepts of spacetime and the equivalence of mass and energy...
Troubleshooting
If you exceed the token limit, the API returns an error indicating the request is too large. Reduce max_tokens or shorten your prompt to fix this.
Key Takeaways
- Use
gemini-1.5-proorgemini-1.5-flashfor up to 32,000 tokens per request. - Use
gemini-2.0-flashfor up to 64,000 tokens per request. - Token limits include both prompt and completion tokens combined.
- Always set
max_tokensbelow the model's token limit to avoid errors. - Check official Google documentation for updates on token limits.