Critical severity HTTP 429 intermediate · Fix: 2-5 min

RateLimitError

openai.RateLimitError (HTTP 429)

What this error means
This error occurs when the AI endpoint detects excessive requests bypassing rate limits, blocking further API calls.

Stack trace

traceback
openai.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached', 'type': 'requests', 'code': 'rate_limit_exceeded'}}
QUICK FIX
Add exponential backoff with retries and respect the API's documented rate limits to avoid triggering RateLimitError.

Why it happens

The AI endpoint enforces strict rate limits to prevent abuse. When clients bypass these limits by sending too many requests too quickly or using unauthorized methods, the server responds with a 429 RateLimitError to block further calls.

Detection

Monitor API response codes for HTTP 429 errors and log request rates to detect when rate limits are being approached or bypassed before the error occurs.

Causes & fixes

1

Client sends requests too rapidly, exceeding the allowed rate limit.

✓ Fix

Implement exponential backoff and retry logic with delays between requests to stay within rate limits.

2

Using multiple API keys or IP addresses to circumvent rate limits.

✓ Fix

Consolidate usage under authorized API keys and avoid distributing keys or IPs to bypass limits.

3

Automated scripts or bots flooding the endpoint without respecting rate limits.

✓ Fix

Add client-side throttling and monitoring to ensure request frequency complies with API policies.

Code: broken vs fixed

Broken - triggers the error
python
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

for _ in range(1000):
    response = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=[{'role': 'user', 'content': 'Hello'}]
    )  # This triggers RateLimitError due to rapid requests
    print(response.choices[0].message.content)
Fixed - works correctly
python
from openai import OpenAI, RateLimitError
import os
import time

client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

for _ in range(1000):
    try:
        response = client.chat.completions.create(
            model='gpt-4o-mini',
            messages=[{'role': 'user', 'content': 'Hello'}]
        )
        print(response.choices[0].message.content)
    except RateLimitError:
        print('Rate limit hit, backing off...')
        time.sleep(10)  # Wait before retrying
        continue  # Retry after delay
Added try/except to catch RateLimitError and implemented a sleep delay to back off and retry, preventing rapid-fire requests that trigger rate limits.

Workaround

Catch RateLimitError exceptions and implement a delay with retries in your client code to temporarily handle rate limiting without crashing.

Prevention

Architect your application to respect documented API rate limits using client-side throttling, exponential backoff, and centralized API key management to prevent bypass attempts.

Python 3.9+ · openai >=1.0.0 · tested on 1.x
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.