RateLimitError
openai.RateLimitError (HTTP 429)
Stack trace
openai.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for fine-tuned model', 'type': 'requests', 'code': 'rate_limit_exceeded'}} Why it happens
OpenAI enforces strict rate limits on fine-tuned models to ensure fair usage and system stability. When your application sends requests faster than the allowed quota or exceeds concurrency limits, the API responds with a 429 RateLimitError. This prevents overloading the service and protects other users.
Detection
Monitor API responses for HTTP 429 status codes and catch openai.RateLimitError exceptions to detect when rate limits are hit before your app crashes.
Causes & fixes
Sending requests too quickly exceeding the fine-tuned model's rate limit quota
Implement exponential backoff and retry logic with delays between requests to stay within the allowed rate limits.
Too many concurrent requests to the fine-tuned model exceeding concurrency limits
Limit the number of parallel API calls by queuing requests or using a concurrency control mechanism.
Using a fine-tuned model with a low quota or insufficient subscription plan
Upgrade your OpenAI subscription plan or request a quota increase from OpenAI support.
Code: broken vs fixed
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="fine-tuned-model",
messages=[{"role": "user", "content": "Hello"}]
) # This line may raise RateLimitError if limits exceeded
print(response.choices[0].message.content) import os
from openai import OpenAI, RateLimitError
import time
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
max_retries = 3
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model=os.environ["OPENAI_FINE_TUNED_MODEL"], # Use env var for model name
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
break
except RateLimitError as e:
if attempt < max_retries - 1:
wait_time = 2 ** attempt # exponential backoff
print(f"Rate limit hit, retrying in {wait_time}s...")
time.sleep(wait_time)
else:
raise e # re-raise after max retries Workaround
Wrap API calls in try/except RateLimitError and on exception, wait a fixed delay before retrying to reduce request rate temporarily.
Prevention
Design your application to respect OpenAI rate limits by implementing concurrency controls and exponential backoff retries, and monitor usage to request quota increases proactively.