High severity HTTP 429 intermediate · Fix: 2-5 min

RateLimitError

openai.RateLimitError (HTTP 429)

What this error means
OpenAI API returns a RateLimitError when the number of tokens sent or received exceeds the allowed tokens per minute quota.

Stack trace

traceback
openai.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit exceeded for tokens per minute', 'type': 'requests', 'code': 'rate_limit_exceeded'}}
QUICK FIX
Add retry logic with exponential backoff on catching RateLimitError to automatically wait and retry after token limits reset.

Why it happens

OpenAI enforces rate limits on the number of tokens processed per minute to ensure fair usage and system stability. When your application sends or receives more tokens than your quota allows within a minute, the API returns this RateLimitError.

Detection

Monitor API responses for HTTP 429 status codes and catch openai.RateLimitError exceptions to detect when token rate limits are exceeded before your app crashes.

Causes & fixes

1

Your application is sending or receiving more tokens per minute than your OpenAI subscription plan allows.

✓ Fix

Reduce the frequency or size of requests, batch inputs efficiently, or upgrade your OpenAI plan to increase token limits.

2

Multiple concurrent requests cumulatively exceed the tokens per minute quota.

✓ Fix

Implement request queuing or rate limiting in your client to throttle concurrent calls and stay within token limits.

3

Using a very large max_tokens or prompt size in requests causing token spikes.

✓ Fix

Lower max_tokens and optimize prompt length to reduce token consumption per request.

Code: broken vs fixed

Broken - triggers the error
python
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=10000  # This large token request may trigger rate limit
)  # This line triggers RateLimitError
Fixed - works correctly
python
from openai import OpenAI, RateLimitError
import os
import time

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

try:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}],
        max_tokens=1000  # Reduced max_tokens to avoid rate limit
    )
    print(response)
except RateLimitError as e:
    print("Rate limit exceeded, retrying after delay...")
    time.sleep(60)  # Wait 60 seconds before retrying
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}],
        max_tokens=1000
    )
    print(response)
# Note: API key must be set in environment variable OPENAI_API_KEY
Added RateLimitError exception handling with retry delay and reduced max_tokens to stay within tokens per minute quota.

Workaround

Catch RateLimitError exceptions and implement a manual wait (e.g., time.sleep(60)) before retrying the request to avoid immediate failures.

Prevention

Use client-side rate limiting and token counting to throttle requests proactively, and consider upgrading your OpenAI plan for higher token quotas to prevent hitting limits.

Python 3.9+ · openai >=1.0.0 · tested on 1.8.0
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.