RateLimitError
RateLimitError (HTTP 429)
Stack trace
openai.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached', 'type': 'requests', 'code': 'rate_limit_exceeded'}} Why it happens
Azure OpenAI enforces strict token usage limits per minute to manage resource allocation. When your application sends requests that cumulatively exceed the allowed tokens per minute quota, the service responds with a 429 RateLimitError to throttle usage.
Detection
Monitor API responses for HTTP 429 status codes and catch RateLimitError exceptions to detect when token rate limits are exceeded before the application crashes.
Causes & fixes
Sending too many tokens in requests within a short time exceeding Azure OpenAI's tokens per minute quota
Reduce the frequency of requests or the size of prompts and completions to stay within the tokens per minute limit.
Multiple parallel requests cumulatively exceeding the token rate limit
Implement request queuing or rate limiting in your client to serialize or throttle requests to Azure OpenAI.
Using a subscription tier with low token per minute limits without adjusting usage accordingly
Upgrade your Azure OpenAI subscription plan to a higher tier with increased token rate limits or optimize token usage.
Code: broken vs fixed
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}]
) # This may raise RateLimitError if tokens per minute exceeded
print(response) import os
from openai import OpenAI, RateLimitError
import time
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}]
) # Added try/except to catch RateLimitError
print(response)
except RateLimitError:
print("Rate limit exceeded, retrying after delay...")
time.sleep(10) # Wait before retrying
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}]
)
print(response)
# Note: API key must be set in environment variable OPENAI_API_KEY Workaround
Catch RateLimitError exceptions and implement exponential backoff retries with delays to avoid immediate failure when token limits are hit.
Prevention
Implement client-side rate limiting and batching to keep token usage within Azure OpenAI quotas, and monitor usage metrics to upgrade plans proactively.