RateLimitError
openai.RateLimitError (HTTP 429)
Stack trace
openai.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached', 'type': 'requests', 'code': 'rate_limit_exceeded'}} Why it happens
The OpenAI Assistants API enforces strict rate limits to prevent abuse and ensure fair usage. When your application sends requests too quickly or exceeds your quota, the API responds with a 429 RateLimitError. This protects the service but requires clients to handle retries or backoff.
Detection
Monitor API responses for HTTP 429 status codes or catch openai.RateLimitError exceptions to detect rate limiting before your app crashes.
Causes & fixes
Sending requests too rapidly without delay or backoff
Implement exponential backoff or fixed delays between requests to respect the API rate limits.
Exceeding your OpenAI account's monthly or per-minute quota
Check your OpenAI usage dashboard and upgrade your plan or request quota increases if needed.
Multiple parallel processes or threads making concurrent API calls
Serialize or limit concurrency of API calls to stay within rate limits.
Code: broken vs fixed
from openai import OpenAI
client = OpenAI()
# This line triggers RateLimitError if too many requests are sent
response = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello"}]) import os
import time
from openai import OpenAI, RateLimitError
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
try:
response = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello"}])
print(response.choices[0].message.content)
except RateLimitError:
print("Rate limit hit, retrying after delay...")
time.sleep(10) # Wait 10 seconds before retrying
response = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello"}])
print(response.choices[0].message.content)
# Note: API key is read from environment variable OPENAI_API_KEY Workaround
Wrap API calls in try/except RateLimitError and implement a manual retry with a fixed sleep delay to avoid immediate failures.
Prevention
Use client-side rate limiting with exponential backoff and monitor usage quotas to avoid hitting OpenAI Assistants API rate limits.