How to beginner · 3 min read

How many tokens can Gemini process

Quick answer
Google's gemini-1.5-pro and gemini-1.5-flash models support up to 32,000 tokens per request, while gemini-2.0-flash can handle up to 64,000 tokens. These limits include both prompt and completion tokens combined.

PREREQUISITES

  • Python 3.8+
  • Google Cloud API key with access to Gemini models
  • pip install google-ai-generativelanguage

Setup

Install the official Google Generative Language SDK and set your API key as an environment variable to authenticate requests.

bash
pip install google-ai-generativelanguage

Step by step

Use the Google Generative Language SDK to send a chat completion request to a Gemini model, respecting the token limits.

python
import os
from google.ai import generativelanguage as glm

# Initialize client
client = glm.TextServiceClient()

# Define the model and prompt
model = "gemini-1.5-pro"
prompt = "Explain the theory of relativity in simple terms."

# Create the request
response = client.generate_text(
    model=model,
    prompt=prompt,
    max_tokens=1000  # well below the 32k token limit
)

print(response.candidates[0].output)
output
The theory of relativity, developed by Albert Einstein, explains how space and time are linked for objects moving at a consistent speed in a straight line...

Common variations

You can use different Gemini models with varying token limits, such as gemini-1.5-flash (32k tokens) or gemini-2.0-flash (64k tokens). Adjust max_tokens accordingly to avoid exceeding limits.

python
response = client.generate_text(
    model="gemini-2.0-flash",
    prompt=prompt,
    max_tokens=2000  # within 64k token limit
)
print(response.candidates[0].output)
output
The theory of relativity revolutionized physics by introducing the concepts of spacetime and the equivalence of mass and energy...

Troubleshooting

If you exceed the token limit, the API returns an error indicating the request is too large. Reduce max_tokens or shorten your prompt to fix this.

Key Takeaways

  • Use gemini-1.5-pro or gemini-1.5-flash for up to 32,000 tokens per request.
  • Use gemini-2.0-flash for up to 64,000 tokens per request.
  • Token limits include both prompt and completion tokens combined.
  • Always set max_tokens below the model's token limit to avoid errors.
  • Check official Google documentation for updates on token limits.
Verified 2026-04 · gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash
Verify ↗