RateLimitError: too many requests
Why this matters
In production, rate limits are a fact of life: your application will hit them. Handling RateLimitError gracefully (with exponential backoff) prevents crashes and maintains user trust instead of failing silently or returning garbage.
Explanation
What it does: The Anthropic API returns a RateLimitError when your request rate exceeds your tier's limits. The error includes a retry_after_ms header that tells you exactly how many milliseconds to wait before the next request will succeed.
How it works: Anthropic's servers count requests per second and per day. When you cross the threshold, the API rejects new requests instead of queuing them. The error response includes HTTP 429 (Too Many Requests) and metadata about when your quota resets. Unlike some APIs that rate-limit silently, Anthropic gives you explicit feedback so your code can react intelligently.
When to use it: Always wrap API calls in a retry handler that catches RateLimitError. Use exponential backoff (wait longer after each failure) rather than hammering the API immediately. This is especially important in loops, batch operations, or when multiple workers call the API concurrently.
Request code
import anthropic
import time
client = anthropic.Anthropic()
def call_with_retry(max_retries: int = 3):
for attempt in range(max_retries):
try:
message = client.messages.create(
model="claude-opus-4-6",
max_tokens=100,
messages=[
{"role": "user", "content": "Say 'hello world'"}
]
)
print(f"Success on attempt {attempt + 1}")
print(message.content[0].text)
return message
except anthropic.RateLimitError as e:
if attempt < max_retries - 1:
wait_ms = getattr(e, 'retry_after_ms', 1000)
wait_seconds = wait_ms / 1000.0
print(f"Rate limited. Waiting {wait_seconds:.1f}s before retry...")
time.sleep(wait_seconds)
else:
print(f"Failed after {max_retries} attempts")
raise
call_with_retry() Authentication
Set your API key as an environment variable before running: ```bash export ANTHROPIC_API_KEY='your-api-key-here' ``` The Anthropic SDK reads this automatically when you instantiate the client.
Response shape
| Field | Description |
|---|---|
status_code | 429 (HTTP status when rate limited) |
error_type | RateLimitError (exception class name) |
message | String describing the rate limit (e.g., 'Rate limit exceeded') |
retry_after_ms | Integer milliseconds to wait before next request (present in error object) |
Field guide
retry_after_ms The authoritative wait time in milliseconds: use this exact value, not a guess. This is the field that prevents you from hammering the API.
error_type Always 'RateLimitError' for rate limits: other errors (AuthenticationError, APIError) need different handling. Catching only RateLimitError prevents masking real bugs.
Setup trap
The Anthropic SDK reads ANTHROPIC_API_KEY at instantiation time. If you set the environment variable after creating the client, it won't pick it up. Set the environment variable *before* importing or instantiating: or pass the key directly: `client = anthropic.Anthropic(api_key='sk-...')`.
Cost
Retrying failed requests costs money. If a batch job hits rate limits and retries 10 times, you've paid for 10 attempts at the same task. Use exponential backoff with a ceiling (e.g., max 30 seconds) to avoid runaway costs. For high-volume workloads, contact Anthropic about higher rate limits before deploying.
Rate limits
Rate limits depend on your tier (free, pro, enterprise) and are measured per-second and per-day. Free tier is ~1 request/second; pro is higher. Burst traffic (e.g., processing a queue in parallel) will hit limits fast. Always implement exponential backoff, not linear.
Common gotcha
Developers often ignore the retry_after_ms value and use a fixed wait time (like 1 second) or increment linearly. This wastes time if the rate limit resets in 100ms, or fails if it needs 5 seconds. Always read and use the actual retry_after_ms from the error.
Error recovery
RateLimitErrorAuthenticationErrorAPIErrorExperienced dev note
Rate limiting teaches you something critical: APIs are finite resources. The cheapest way to scale is to respect rate limits from day one: exponential backoff + jitter (random delay) prevents thundering herd when many workers retry simultaneously. Don't think 'we'll handle this later'; add it to your first API call wrapper. Also: monitor your actual request rate in production. If you're consistently near limits, request a tier upgrade before you hit the wall.
Check your understanding
Why is reading the retry_after_ms field from the error better than just waiting a fixed 1 second and retrying?
Show answer hint
The retry_after_ms is dynamic: it reflects the actual state of Anthropic's rate limit bucket. A fixed wait is either too short (wastes retries) or too long (slows your app unnecessarily). Reading the actual value is both faster and more reliable.