RateLimitError
openai.RateLimitError (HTTP 429)
Stack trace
openai.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached', 'type': 'requests', 'code': 'rate_limit_exceeded'}} Why it happens
OpenAI enforces strict rate limits on embedding API calls to prevent abuse and ensure fair usage. When your application sends embedding requests too quickly or exceeds your quota, the API responds with a 429 RateLimitError.
Detection
Monitor API responses for HTTP 429 status codes or catch RateLimitError exceptions to detect when rate limits are hit before your app crashes.
Causes & fixes
Sending embedding requests too rapidly without delay or batching
Implement request throttling or batching to reduce the frequency of embedding API calls.
Exceeding your OpenAI account's monthly embedding quota
Check your OpenAI usage dashboard and upgrade your plan or request quota increases if needed.
Using multiple parallel processes or threads making embedding calls simultaneously
Serialize embedding requests or use a rate limiter to control concurrency.
Code: broken vs fixed
from openai import OpenAI
client = OpenAI()
# This line triggers RateLimitError when called too frequently
embedding = client.embeddings.create(model="text-embedding-3-small", input=["text1", "text2"])
print(embedding) import os
import time
from openai import OpenAI, RateLimitError
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
inputs = ["text1", "text2"]
# Added retry with exponential backoff to handle rate limits
max_retries = 5
for attempt in range(max_retries):
try:
embedding = client.embeddings.create(model="text-embedding-3-small", input=inputs)
print(embedding)
break
except RateLimitError:
wait_time = 2 ** attempt
print(f"Rate limit hit, retrying in {wait_time} seconds...")
time.sleep(wait_time)
else:
print("Failed after retries due to rate limits.") Workaround
Catch RateLimitError exceptions and implement a manual delay before retrying embedding requests to avoid immediate failures.
Prevention
Use request batching, concurrency control, and monitor usage quotas to stay within OpenAI embedding rate limits and avoid 429 errors.