AWS Bedrock rate limits explained
Quick answer
AWS Bedrock enforces rate limits to control the number of API requests per second and prevent service overload. When exceeded, you receive a
ThrottlingException error. Implementing exponential backoff retries in your API calls handles these limits gracefully. ERROR TYPE
api_error ⚡ QUICK FIX
Add exponential backoff retry logic around your API call to handle
ThrottlingException automatically.Why this happens
AWS Bedrock applies rate limits to API requests to ensure fair usage and maintain service stability. If your application sends too many requests in a short time, the service returns a ThrottlingException error with HTTP status code 429.
Typical triggering code looks like this:
import boto3
client = boto3.client("bedrock-runtime", region_name="us-east-1")
response = client.converse(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[{"role": "user", "content": [{"type": "text", "text": "Hello"}]}]
)
print(response["output"]["message"]["content"][0]["text"]) output
ThrottlingException: Rate exceeded
The fix
Implement exponential backoff retry logic to handle ThrottlingException. This approach retries the request after increasing delays, reducing request bursts and respecting rate limits.
Example with retries using boto3 and botocore exceptions:
import boto3
import time
from botocore.exceptions import ClientError
client = boto3.client("bedrock-runtime", region_name="us-east-1")
max_retries = 5
retry_delay = 1 # seconds
for attempt in range(max_retries):
try:
response = client.converse(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[{"role": "user", "content": [{"type": "text", "text": "Hello"}]}]
)
print(response["output"]["message"]["content"][0]["text"])
break
except ClientError as e:
if e.response['Error']['Code'] == 'ThrottlingException':
print(f"Rate limit hit, retrying in {retry_delay} seconds...")
time.sleep(retry_delay)
retry_delay *= 2 # exponential backoff
else:
raise
else:
print("Max retries exceeded.") output
Hello # or Rate limit hit, retrying in 1 seconds... Hello
Preventing it in production
- Use exponential backoff with jitter to avoid synchronized retries.
- Monitor API usage and throttle your client requests proactively.
- Implement circuit breakers to fail fast when limits are persistently hit.
- Cache frequent responses to reduce unnecessary calls.
- Consult AWS Bedrock documentation for specific rate limit quotas per model and region.
Key Takeaways
- AWS Bedrock rate limits trigger
ThrottlingExceptionwhen exceeded. - Exponential backoff retries prevent request bursts and handle rate limits gracefully.
- Proactive monitoring and caching reduce the chance of hitting rate limits.