High severity HTTP 429 intermediate · Fix: 2-5 min

RateLimitError

litellm.client.RateLimitError (HTTP 429)

What this error means
LiteLLM's API provider rejected the request due to exceeding the allowed rate limit quota, causing a 429 RateLimitError.

Stack trace

traceback
Traceback (most recent call last):
  File "app.py", line 42, in <module>
    response = client.chat.completions.create(model="litellm-1.0", messages=messages)
  File "/usr/local/lib/python3.9/site-packages/litellm/client.py", line 210, in create
    raise RateLimitError("Rate limit exceeded")
litellm.client.RateLimitError: Rate limit exceeded (HTTP 429)
QUICK FIX
Add retry logic with exponential backoff catching litellm.client.RateLimitError to automatically retry after delays.

Why it happens

The LiteLLM API provider enforces strict rate limits to prevent abuse and ensure fair usage. When your application sends requests too frequently or exceeds your quota, the server responds with a 429 error to throttle your usage.

Detection

Monitor API response status codes and catch litellm.client.RateLimitError exceptions to detect rate limiting before your app crashes or degrades.

Causes & fixes

1

Sending requests too rapidly exceeding the provider's rate limit

✓ Fix

Implement exponential backoff and retry logic with delays between requests to stay within rate limits.

2

Using an API key with a low quota or expired subscription

✓ Fix

Check your LiteLLM account dashboard to verify your quota and renew or upgrade your subscription if needed.

3

Multiple parallel processes or threads making concurrent requests without coordination

✓ Fix

Serialize or limit concurrent API calls using a rate limiter or queue to avoid bursts that trigger rate limits.

Code: broken vs fixed

Broken - triggers the error
python
from litellm import Client

client = Client(api_key="my_api_key")  # Hardcoded key triggers security risk
messages = [{"role": "user", "content": "Hello"}]
response = client.chat.completions.create(model="litellm-1.0", messages=messages)  # Raises RateLimitError 429
Fixed - works correctly
python
import os
from litellm import Client, RateLimitError
import time

client = Client(api_key=os.environ["LITELLM_API_KEY"])  # Use env var for API key
messages = [{"role": "user", "content": "Hello"}]

max_retries = 3
for attempt in range(max_retries):
    try:
        response = client.chat.completions.create(model="litellm-1.0", messages=messages)
        print(response)
        break
    except RateLimitError:
        wait_time = 2 ** attempt  # Exponential backoff
        print(f"Rate limit hit, retrying in {wait_time} seconds...")
        time.sleep(wait_time)
else:
    print("Failed after retries due to rate limiting.")
Added environment variable usage for API key and implemented exponential backoff retry catching RateLimitError to handle rate limits gracefully.

Workaround

Catch RateLimitError exceptions and pause your application for a fixed delay (e.g., 30 seconds) before retrying the request to avoid immediate failures.

Prevention

Use a centralized rate limiter in your application to throttle requests below the provider's limits and monitor usage quotas proactively to avoid hitting rate limits.

Python 3.9+ · litellm >=0.1.0 · tested on 0.2.5
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.