RateLimitError
litellm.client.RateLimitError (HTTP 429)
Stack trace
Traceback (most recent call last):
File "app.py", line 42, in <module>
response = client.chat.completions.create(model="litellm-1.0", messages=messages)
File "/usr/local/lib/python3.9/site-packages/litellm/client.py", line 210, in create
raise RateLimitError("Rate limit exceeded")
litellm.client.RateLimitError: Rate limit exceeded (HTTP 429) Why it happens
The LiteLLM API provider enforces strict rate limits to prevent abuse and ensure fair usage. When your application sends requests too frequently or exceeds your quota, the server responds with a 429 error to throttle your usage.
Detection
Monitor API response status codes and catch litellm.client.RateLimitError exceptions to detect rate limiting before your app crashes or degrades.
Causes & fixes
Sending requests too rapidly exceeding the provider's rate limit
Implement exponential backoff and retry logic with delays between requests to stay within rate limits.
Using an API key with a low quota or expired subscription
Check your LiteLLM account dashboard to verify your quota and renew or upgrade your subscription if needed.
Multiple parallel processes or threads making concurrent requests without coordination
Serialize or limit concurrent API calls using a rate limiter or queue to avoid bursts that trigger rate limits.
Code: broken vs fixed
from litellm import Client
client = Client(api_key="my_api_key") # Hardcoded key triggers security risk
messages = [{"role": "user", "content": "Hello"}]
response = client.chat.completions.create(model="litellm-1.0", messages=messages) # Raises RateLimitError 429 import os
from litellm import Client, RateLimitError
import time
client = Client(api_key=os.environ["LITELLM_API_KEY"]) # Use env var for API key
messages = [{"role": "user", "content": "Hello"}]
max_retries = 3
for attempt in range(max_retries):
try:
response = client.chat.completions.create(model="litellm-1.0", messages=messages)
print(response)
break
except RateLimitError:
wait_time = 2 ** attempt # Exponential backoff
print(f"Rate limit hit, retrying in {wait_time} seconds...")
time.sleep(wait_time)
else:
print("Failed after retries due to rate limiting.") Workaround
Catch RateLimitError exceptions and pause your application for a fixed delay (e.g., 30 seconds) before retrying the request to avoid immediate failures.
Prevention
Use a centralized rate limiter in your application to throttle requests below the provider's limits and monitor usage quotas proactively to avoid hitting rate limits.