RateLimitError
openai.RateLimitError (HTTP 429)
Stack trace
openai.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached', 'type': 'requests', 'code': 'rate_limit_exceeded'}} Why it happens
The AI endpoint enforces strict rate limits to prevent abuse. When clients bypass these limits by sending too many requests too quickly or using unauthorized methods, the server responds with a 429 RateLimitError to block further calls.
Detection
Monitor API response codes for HTTP 429 errors and log request rates to detect when rate limits are being approached or bypassed before the error occurs.
Causes & fixes
Client sends requests too rapidly, exceeding the allowed rate limit.
Implement exponential backoff and retry logic with delays between requests to stay within rate limits.
Using multiple API keys or IP addresses to circumvent rate limits.
Consolidate usage under authorized API keys and avoid distributing keys or IPs to bypass limits.
Automated scripts or bots flooding the endpoint without respecting rate limits.
Add client-side throttling and monitoring to ensure request frequency complies with API policies.
Code: broken vs fixed
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
for _ in range(1000):
response = client.chat.completions.create(
model='gpt-4o-mini',
messages=[{'role': 'user', 'content': 'Hello'}]
) # This triggers RateLimitError due to rapid requests
print(response.choices[0].message.content) from openai import OpenAI, RateLimitError
import os
import time
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
for _ in range(1000):
try:
response = client.chat.completions.create(
model='gpt-4o-mini',
messages=[{'role': 'user', 'content': 'Hello'}]
)
print(response.choices[0].message.content)
except RateLimitError:
print('Rate limit hit, backing off...')
time.sleep(10) # Wait before retrying
continue # Retry after delay Workaround
Catch RateLimitError exceptions and implement a delay with retries in your client code to temporarily handle rate limiting without crashing.
Prevention
Architect your application to respect documented API rate limits using client-side throttling, exponential backoff, and centralized API key management to prevent bypass attempts.