High severity HTTP 429 intermediate · Fix: 5-15 min

RESOURCE_EXHAUSTED

google.api_core.exceptions.ResourceExhausted (Quota exceeded on free tier)

What this error means

Gemini API rejected your request because you've exceeded the free tier rate limit (60 requests per minute) or daily quota, triggering a RESOURCE_EXHAUSTED error.

Stack trace

traceback

google.api_core.exceptions.ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).

Details: Quota exceeded for quota metric 'requests-per-minute' and 'quota_project' 'projects/YOUR_PROJECT_ID'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "app.py", line 12, in <module>
    response = model.generate_content("Hello")
  File "google/generativeai/client.py", line 156, in generate_content
    return response
  File "google/api_core/gapic_v1/client_info.py", line 87, in _request_retry_wrapper
    raise exceptions.ResourceExhausted("Quota exceeded")
google.api_core.exceptions.ResourceExhausted: 429 Resource has been exhausted.

QUICK FIX

Wrap generate_content() in a try/except ResourceExhausted block with exponential backoff retry (wait 60 seconds, then double for each retry) and upgrade to a paid plan for production.

Why it happens

Google's free tier Gemini API has strict rate limits: 60 requests per minute and a daily quota cap. When you exceed these limits (either by sending requests too fast or hitting cumulative daily usage), the API returns a 429 ResourceExhausted error. This is intentional quota enforcement to prevent abuse. The error includes metadata showing which quota metric was exceeded (requests-per-minute, tokens-per-day, etc.).

Detection

Monitor your request rate and add logging to track quota-related responses. Implement preemptive backoff when responses include retry-after headers. Set up Cloud Monitoring alerts for quota exhaustion events in your Google Cloud project.

Causes & fixes

Sending more than 60 requests per minute on free tier

✓ Fix

Implement exponential backoff retry logic. When ResourceExhausted is caught, wait 60+ seconds before retrying, and double the wait time for each subsequent retry. Use the retry-after header from the response if available.

Exceeding daily request quota (2,000 requests/day on free tier as of 2026)

✓ Fix

Upgrade to a paid Gemini API plan (Google AI Studio Pro or use Vertex AI with committed use discounts). Free tier is designed for development/testing only, not production.

Concurrent requests from multiple processes/threads hitting quota simultaneously

✓ Fix

Implement a request queue with rate limiting. Use a threading.Semaphore or async queue to serialize requests and ensure you never exceed 60 req/min. Example: limit to 1 request per second (60/min).

Using free tier for production or high-traffic application

✓ Fix

Migrate to Vertex AI Generative AI API with a paid Google Cloud project. Vertex AI has higher quotas and better SLA guarantees. Update credentials: use service account JSON instead of API key.

Code: broken vs fixed

Broken - triggers the error

python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')

# BAD: No retry logic — hits quota and crashes
for i in range(100):
    response = model.generate_content(f"Request {i}")  # Line that triggers RESOURCE_EXHAUSTED
    print(response.text)

Fixed - works correctly

python

import google.generativeai as genai
import os
import time
from google.api_core.exceptions import ResourceExhausted

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')

def generate_with_retry(prompt, max_retries=3):
    """Generate content with exponential backoff on quota exhaustion."""
    wait_time = 60  # Start with 60 seconds
    for attempt in range(max_retries):
        try:
            response = model.generate_content(prompt)  # FIXED: Wrapped in retry logic
            return response
        except ResourceExhausted as e:
            if attempt < max_retries - 1:
                print(f"Quota exceeded. Waiting {wait_time}s before retry...")
                time.sleep(wait_time)
                wait_time *= 2  # Exponential backoff
            else:
                print("Max retries exceeded. Upgrade to paid plan.")
                raise

# FIXED: Rate-limited loop (max 1 req/sec = 60 req/min)
for i in range(10):
    response = generate_with_retry(f"Request {i}")
    print(response.text)
    time.sleep(1)  # Ensure we stay under 60 req/min

Added ResourceExhausted exception handling with exponential backoff retry (60s, 120s, 240s) and rate-limited the request loop to 1 request per second to stay within free tier limits.

⚠

Workaround

If you can't wait for retries, use batch processing with daily quotas: split your requests across multiple days (2,000 requests/day free tier limit), or cache API responses aggressively to avoid redundant calls. Store previous responses and check cache before hitting the API.

✓

Prevention

For production: upgrade to Google AI Studio Pro or Vertex AI with a paid project immediately: free tier is never suitable for real applications. Implement request queuing with a rate limiter (threading.Semaphore or asyncio.Semaphore) to enforce <1 req/second. Use Vertex AI which offers 100 req/min base quota and can be increased via quota requests. Monitor quota via Cloud Monitoring dashboards and set alerts at 70% utilization.

Python 3.9+ · google-generativeai >=0.3.0 · tested on 0.7.x

Verified 2026-04 · gemini-2.0-flash, gemini-1.5-pro

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.