MCP server rate limiting best practices
Quick answer
When using
MCP servers, implement exponential backoff with jitter to handle RateLimitError gracefully. This prevents overwhelming the server and ensures reliable communication between your AI agents and resources. ERROR TYPE
api_error ⚡ QUICK FIX
Add exponential backoff retry logic around your API call to handle
RateLimitError automatically.Why this happens
MCP servers enforce rate limits to protect resources and maintain stability. Excessive or rapid requests trigger RateLimitError, causing your client to receive HTTP 429 or similar errors. For example, calling client.messages.create() in a tight loop without delay can exceed allowed request rates.
Error output typically looks like:
RateLimitError: Too many requests, please slow down.from anthropic import Anthropic, RateLimitError
import os
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
# Broken example: no rate limiting, rapid calls
for _ in range(10):
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=100,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.content[0].text) output
RateLimitError: Too many requests, please slow down.
The fix
Implement exponential backoff with jitter to retry requests after receiving a RateLimitError. This approach spaces out retries, reducing server load and increasing success rates.
from anthropic import Anthropic, RateLimitError
import os
import time
import random
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
max_retries = 5
base_delay = 1 # seconds
for attempt in range(max_retries):
try:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=100,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.content[0].text)
break # success
except RateLimitError:
delay = base_delay * (2 ** attempt) + random.uniform(0, 0.5)
print(f"Rate limited, retrying in {delay:.2f} seconds...")
time.sleep(delay)
else:
print("Failed after retries due to rate limiting.") output
Rate limited, retrying in 1.23 seconds... Hello! How can I assist you today?
Preventing it in production
- Use client-side rate limiting to throttle requests proactively.
- Validate request volume against MCP server documentation limits.
- Implement robust retry logic with exponential backoff and jitter as shown.
- Monitor error rates and adjust request frequency dynamically.
- Consider queuing or batching requests to reduce peak load.
Key Takeaways
- Use exponential backoff with jitter to handle MCP rate limits gracefully.
- Proactively throttle requests to avoid hitting server limits.
- Monitor and adapt request patterns based on error feedback.