RESOURCE_EXHAUSTED
google.api_core.exceptions.ResourceExhausted (Quota exceeded on free tier)
Stack trace
google.api_core.exceptions.ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).
Details: Quota exceeded for quota metric 'requests-per-minute' and 'quota_project' 'projects/YOUR_PROJECT_ID'.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "app.py", line 12, in <module>
response = model.generate_content("Hello")
File "google/generativeai/client.py", line 156, in generate_content
return response
File "google/api_core/gapic_v1/client_info.py", line 87, in _request_retry_wrapper
raise exceptions.ResourceExhausted("Quota exceeded")
google.api_core.exceptions.ResourceExhausted: 429 Resource has been exhausted. Why it happens
Google's free tier Gemini API has strict rate limits: 60 requests per minute and a daily quota cap. When you exceed these limits (either by sending requests too fast or hitting cumulative daily usage), the API returns a 429 ResourceExhausted error. This is intentional quota enforcement to prevent abuse. The error includes metadata showing which quota metric was exceeded (requests-per-minute, tokens-per-day, etc.).
Detection
Monitor your request rate and add logging to track quota-related responses. Implement preemptive backoff when responses include retry-after headers. Set up Cloud Monitoring alerts for quota exhaustion events in your Google Cloud project.
Causes & fixes
Sending more than 60 requests per minute on free tier
Implement exponential backoff retry logic. When ResourceExhausted is caught, wait 60+ seconds before retrying, and double the wait time for each subsequent retry. Use the retry-after header from the response if available.
Exceeding daily request quota (2,000 requests/day on free tier as of 2026)
Upgrade to a paid Gemini API plan (Google AI Studio Pro or use Vertex AI with committed use discounts). Free tier is designed for development/testing only, not production.
Concurrent requests from multiple processes/threads hitting quota simultaneously
Implement a request queue with rate limiting. Use a threading.Semaphore or async queue to serialize requests and ensure you never exceed 60 req/min. Example: limit to 1 request per second (60/min).
Using free tier for production or high-traffic application
Migrate to Vertex AI Generative AI API with a paid Google Cloud project. Vertex AI has higher quotas and better SLA guarantees. Update credentials: use service account JSON instead of API key.
Code: broken vs fixed
import google.generativeai as genai
import os
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')
# BAD: No retry logic — hits quota and crashes
for i in range(100):
response = model.generate_content(f"Request {i}") # Line that triggers RESOURCE_EXHAUSTED
print(response.text) import google.generativeai as genai
import os
import time
from google.api_core.exceptions import ResourceExhausted
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')
def generate_with_retry(prompt, max_retries=3):
"""Generate content with exponential backoff on quota exhaustion."""
wait_time = 60 # Start with 60 seconds
for attempt in range(max_retries):
try:
response = model.generate_content(prompt) # FIXED: Wrapped in retry logic
return response
except ResourceExhausted as e:
if attempt < max_retries - 1:
print(f"Quota exceeded. Waiting {wait_time}s before retry...")
time.sleep(wait_time)
wait_time *= 2 # Exponential backoff
else:
print("Max retries exceeded. Upgrade to paid plan.")
raise
# FIXED: Rate-limited loop (max 1 req/sec = 60 req/min)
for i in range(10):
response = generate_with_retry(f"Request {i}")
print(response.text)
time.sleep(1) # Ensure we stay under 60 req/min Workaround
If you can't wait for retries, use batch processing with daily quotas: split your requests across multiple days (2,000 requests/day free tier limit), or cache API responses aggressively to avoid redundant calls. Store previous responses and check cache before hitting the API.
Prevention
For production: upgrade to Google AI Studio Pro or Vertex AI with a paid project immediately: free tier is never suitable for real applications. Implement request queuing with a rate limiter (threading.Semaphore or asyncio.Semaphore) to enforce <1 req/second. Use Vertex AI which offers 100 req/min base quota and can be increased via quota requests. Monitor quota via Cloud Monitoring dashboards and set alerts at 70% utilization.