High severity HTTP 429 beginner · Fix: 2-5 min

RateLimitError

openai.RateLimitError (HTTP 429)

What this error means
OpenAI's Assistants API returned a 429 RateLimitError indicating too many requests in a short time.

Stack trace

traceback
openai.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached', 'type': 'requests', 'code': 'rate_limit_exceeded'}}
QUICK FIX
Catch openai.RateLimitError exceptions and retry requests after a delay using exponential backoff.

Why it happens

The OpenAI Assistants API enforces strict rate limits to prevent abuse and ensure fair usage. When your application sends requests too quickly or exceeds your quota, the API responds with a 429 RateLimitError. This protects the service but requires clients to handle retries or backoff.

Detection

Monitor API responses for HTTP 429 status codes or catch openai.RateLimitError exceptions to detect rate limiting before your app crashes.

Causes & fixes

1

Sending requests too rapidly without delay or backoff

✓ Fix

Implement exponential backoff or fixed delays between requests to respect the API rate limits.

2

Exceeding your OpenAI account's monthly or per-minute quota

✓ Fix

Check your OpenAI usage dashboard and upgrade your plan or request quota increases if needed.

3

Multiple parallel processes or threads making concurrent API calls

✓ Fix

Serialize or limit concurrency of API calls to stay within rate limits.

Code: broken vs fixed

Broken - triggers the error
python
from openai import OpenAI

client = OpenAI()

# This line triggers RateLimitError if too many requests are sent
response = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello"}])
Fixed - works correctly
python
import os
import time
from openai import OpenAI, RateLimitError

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

try:
    response = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello"}])
    print(response.choices[0].message.content)
except RateLimitError:
    print("Rate limit hit, retrying after delay...")
    time.sleep(10)  # Wait 10 seconds before retrying
    response = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello"}])
    print(response.choices[0].message.content)
# Note: API key is read from environment variable OPENAI_API_KEY
Added explicit catch for RateLimitError with a retry delay to handle rate limiting gracefully.

Workaround

Wrap API calls in try/except RateLimitError and implement a manual retry with a fixed sleep delay to avoid immediate failures.

Prevention

Use client-side rate limiting with exponential backoff and monitor usage quotas to avoid hitting OpenAI Assistants API rate limits.

Python 3.9+ · openai >=1.0.0 · tested on 1.5.x
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.