API Beginner easy · 5 min

RateLimitError: too many requests

What you will learn
The Anthropic API enforces rate limits on requests; RateLimitError tells you when you've exceeded them and how long to wait before retrying.

Why this matters

In production, rate limits are a fact of life: your application will hit them. Handling RateLimitError gracefully (with exponential backoff) prevents crashes and maintains user trust instead of failing silently or returning garbage.

Skip if: If you're building a throwaway script or testing locally with very low traffic, you won't encounter rate limits and don't need this. But any production service, batch job, or multi-user application must handle this.

Explanation

What it does: The Anthropic API returns a RateLimitError when your request rate exceeds your tier's limits. The error includes a retry_after_ms header that tells you exactly how many milliseconds to wait before the next request will succeed.

How it works: Anthropic's servers count requests per second and per day. When you cross the threshold, the API rejects new requests instead of queuing them. The error response includes HTTP 429 (Too Many Requests) and metadata about when your quota resets. Unlike some APIs that rate-limit silently, Anthropic gives you explicit feedback so your code can react intelligently.

When to use it: Always wrap API calls in a retry handler that catches RateLimitError. Use exponential backoff (wait longer after each failure) rather than hammering the API immediately. This is especially important in loops, batch operations, or when multiple workers call the API concurrently.

Request code

python
import anthropic
import time

client = anthropic.Anthropic()

def call_with_retry(max_retries: int = 3):
    for attempt in range(max_retries):
        try:
            message = client.messages.create(
                model="claude-opus-4-6",
                max_tokens=100,
                messages=[
                    {"role": "user", "content": "Say 'hello world'"}
                ]
            )
            print(f"Success on attempt {attempt + 1}")
            print(message.content[0].text)
            return message
        except anthropic.RateLimitError as e:
            if attempt < max_retries - 1:
                wait_ms = getattr(e, 'retry_after_ms', 1000)
                wait_seconds = wait_ms / 1000.0
                print(f"Rate limited. Waiting {wait_seconds:.1f}s before retry...")
                time.sleep(wait_seconds)
            else:
                print(f"Failed after {max_retries} attempts")
                raise

call_with_retry()

Authentication

Set your API key as an environment variable before running: ```bash export ANTHROPIC_API_KEY='your-api-key-here' ``` The Anthropic SDK reads this automatically when you instantiate the client.

Response shape

FieldDescription
status_code 429 (HTTP status when rate limited)
error_type RateLimitError (exception class name)
message String describing the rate limit (e.g., 'Rate limit exceeded')
retry_after_ms Integer milliseconds to wait before next request (present in error object)

Field guide

retry_after_ms

The authoritative wait time in milliseconds: use this exact value, not a guess. This is the field that prevents you from hammering the API.

error_type

Always 'RateLimitError' for rate limits: other errors (AuthenticationError, APIError) need different handling. Catching only RateLimitError prevents masking real bugs.

Setup trap

The Anthropic SDK reads ANTHROPIC_API_KEY at instantiation time. If you set the environment variable after creating the client, it won't pick it up. Set the environment variable *before* importing or instantiating: or pass the key directly: `client = anthropic.Anthropic(api_key='sk-...')`.

Cost

Retrying failed requests costs money. If a batch job hits rate limits and retries 10 times, you've paid for 10 attempts at the same task. Use exponential backoff with a ceiling (e.g., max 30 seconds) to avoid runaway costs. For high-volume workloads, contact Anthropic about higher rate limits before deploying.

Rate limits

Rate limits depend on your tier (free, pro, enterprise) and are measured per-second and per-day. Free tier is ~1 request/second; pro is higher. Burst traffic (e.g., processing a queue in parallel) will hit limits fast. Always implement exponential backoff, not linear.

Common gotcha

Developers often ignore the retry_after_ms value and use a fixed wait time (like 1 second) or increment linearly. This wastes time if the rate limit resets in 100ms, or fails if it needs 5 seconds. Always read and use the actual retry_after_ms from the error.

Error recovery

RateLimitError
Returned when requests exceed your rate limit. Fix: catch it, read retry_after_ms, sleep that duration, and retry. Use exponential backoff for robustness.
AuthenticationError
Returned when ANTHROPIC_API_KEY is missing or invalid. This is not a rate limit: don't retry. Check your key and environment setup.
APIError
Returned for server errors (500s), not rate limits. Retry with exponential backoff, but rate limits have their own exception type.

Experienced dev note

Rate limiting teaches you something critical: APIs are finite resources. The cheapest way to scale is to respect rate limits from day one: exponential backoff + jitter (random delay) prevents thundering herd when many workers retry simultaneously. Don't think 'we'll handle this later'; add it to your first API call wrapper. Also: monitor your actual request rate in production. If you're consistently near limits, request a tier upgrade before you hit the wall.

Check your understanding

Why is reading the retry_after_ms field from the error better than just waiting a fixed 1 second and retrying?

Show answer hint

The retry_after_ms is dynamic: it reflects the actual state of Anthropic's rate limit bucket. A fixed wait is either too short (wastes retries) or too long (slows your app unnecessarily). Reading the actual value is both faster and more reliable.

VERSION anthropic 0.94.x uses the Messages API with RateLimitError. Older versions (pre-0.2.0) used different exception classes and the Completions API: this guide applies only to current versions.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.