High severity HTTP 429 intermediate · Fix: 2-5 min

RateLimitError

openai.RateLimitError (HTTP 429)

What this error means

OpenAI returns a 429 RateLimitError when embedding requests exceed the allowed API rate limits or quotas.

Stack trace

traceback

openai.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached', 'type': 'requests', 'code': 'rate_limit_exceeded'}}

QUICK FIX

Add exponential backoff retry logic around embedding calls to automatically retry after rate limit errors.

Why it happens

OpenAI enforces strict rate limits on embedding API calls to prevent abuse and ensure fair usage. When your application sends embedding requests too quickly or exceeds your quota, the API responds with a 429 RateLimitError.

Detection

Monitor API responses for HTTP 429 status codes or catch RateLimitError exceptions to detect when rate limits are hit before your app crashes.

Causes & fixes

Sending embedding requests too rapidly without delay or batching

✓ Fix

Implement request throttling or batching to reduce the frequency of embedding API calls.

Exceeding your OpenAI account's monthly embedding quota

✓ Fix

Check your OpenAI usage dashboard and upgrade your plan or request quota increases if needed.

Using multiple parallel processes or threads making embedding calls simultaneously

✓ Fix

Serialize embedding requests or use a rate limiter to control concurrency.

Code: broken vs fixed

Broken - triggers the error

python

from openai import OpenAI
client = OpenAI()

# This line triggers RateLimitError when called too frequently
embedding = client.embeddings.create(model="text-embedding-3-small", input=["text1", "text2"])
print(embedding)

Fixed - works correctly

python

import os
import time
from openai import OpenAI, RateLimitError

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

inputs = ["text1", "text2"]

# Added retry with exponential backoff to handle rate limits
max_retries = 5
for attempt in range(max_retries):
    try:
        embedding = client.embeddings.create(model="text-embedding-3-small", input=inputs)
        print(embedding)
        break
    except RateLimitError:
        wait_time = 2 ** attempt
        print(f"Rate limit hit, retrying in {wait_time} seconds...")
        time.sleep(wait_time)
else:
    print("Failed after retries due to rate limits.")

Added exponential backoff retry around the embedding call to handle and recover from rate limit errors gracefully.

⚠

Workaround

Catch RateLimitError exceptions and implement a manual delay before retrying embedding requests to avoid immediate failures.

✓

Prevention

Use request batching, concurrency control, and monitor usage quotas to stay within OpenAI embedding rate limits and avoid 429 errors.

Python 3.9+ · openai >=1.0.0 · tested on 1.x

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.