RateLimitError
openai.RateLimitError (HTTP 429)
Stack trace
openai.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit exceeded for tokens per minute', 'type': 'requests', 'code': 'rate_limit_exceeded'}} Why it happens
OpenAI enforces rate limits on the number of tokens processed per minute to ensure fair usage and system stability. When your application sends or receives more tokens than your quota allows within a minute, the API returns this RateLimitError.
Detection
Monitor API responses for HTTP 429 status codes and catch openai.RateLimitError exceptions to detect when token rate limits are exceeded before your app crashes.
Causes & fixes
Your application is sending or receiving more tokens per minute than your OpenAI subscription plan allows.
Reduce the frequency or size of requests, batch inputs efficiently, or upgrade your OpenAI plan to increase token limits.
Multiple concurrent requests cumulatively exceed the tokens per minute quota.
Implement request queuing or rate limiting in your client to throttle concurrent calls and stay within token limits.
Using a very large max_tokens or prompt size in requests causing token spikes.
Lower max_tokens and optimize prompt length to reduce token consumption per request.
Code: broken vs fixed
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=10000 # This large token request may trigger rate limit
) # This line triggers RateLimitError from openai import OpenAI, RateLimitError
import os
import time
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=1000 # Reduced max_tokens to avoid rate limit
)
print(response)
except RateLimitError as e:
print("Rate limit exceeded, retrying after delay...")
time.sleep(60) # Wait 60 seconds before retrying
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=1000
)
print(response)
# Note: API key must be set in environment variable OPENAI_API_KEY Workaround
Catch RateLimitError exceptions and implement a manual wait (e.g., time.sleep(60)) before retrying the request to avoid immediate failures.
Prevention
Use client-side rate limiting and token counting to throttle requests proactively, and consider upgrading your OpenAI plan for higher token quotas to prevent hitting limits.