Debug Fix intermediate · 3 min read

MCP server rate limiting best practices

Quick answer
When using MCP servers, implement exponential backoff with jitter to handle RateLimitError gracefully. This prevents overwhelming the server and ensures reliable communication between your AI agents and resources.
ERROR TYPE api_error
⚡ QUICK FIX
Add exponential backoff retry logic around your API call to handle RateLimitError automatically.

Why this happens

MCP servers enforce rate limits to protect resources and maintain stability. Excessive or rapid requests trigger RateLimitError, causing your client to receive HTTP 429 or similar errors. For example, calling client.messages.create() in a tight loop without delay can exceed allowed request rates.

Error output typically looks like:

RateLimitError: Too many requests, please slow down.
python
from anthropic import Anthropic, RateLimitError
import os

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

# Broken example: no rate limiting, rapid calls
for _ in range(10):
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=100,
        system="You are a helpful assistant.",
        messages=[{"role": "user", "content": "Hello"}]
    )
    print(response.content[0].text)
output
RateLimitError: Too many requests, please slow down.

The fix

Implement exponential backoff with jitter to retry requests after receiving a RateLimitError. This approach spaces out retries, reducing server load and increasing success rates.

python
from anthropic import Anthropic, RateLimitError
import os
import time
import random

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

max_retries = 5
base_delay = 1  # seconds

for attempt in range(max_retries):
    try:
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=100,
            system="You are a helpful assistant.",
            messages=[{"role": "user", "content": "Hello"}]
        )
        print(response.content[0].text)
        break  # success
    except RateLimitError:
        delay = base_delay * (2 ** attempt) + random.uniform(0, 0.5)
        print(f"Rate limited, retrying in {delay:.2f} seconds...")
        time.sleep(delay)
else:
    print("Failed after retries due to rate limiting.")
output
Rate limited, retrying in 1.23 seconds...
Hello! How can I assist you today?

Preventing it in production

  • Use client-side rate limiting to throttle requests proactively.
  • Validate request volume against MCP server documentation limits.
  • Implement robust retry logic with exponential backoff and jitter as shown.
  • Monitor error rates and adjust request frequency dynamically.
  • Consider queuing or batching requests to reduce peak load.

Key Takeaways

  • Use exponential backoff with jitter to handle MCP rate limits gracefully.
  • Proactively throttle requests to avoid hitting server limits.
  • Monitor and adapt request patterns based on error feedback.
Verified 2026-04 · claude-3-5-sonnet-20241022
Verify ↗