Debug Fix beginner · 3 min read

AWS Bedrock rate limits explained

Quick answer

AWS Bedrock enforces rate limits to control the number of API requests per second and prevent service overload. When exceeded, you receive a ThrottlingException error. Implementing exponential backoff retries in your API calls handles these limits gracefully.

ERROR TYPE api_error

QUICK FIX

Add exponential backoff retry logic around your API call to handle ThrottlingException automatically.

Why this happens

AWS Bedrock applies rate limits to API requests to ensure fair usage and maintain service stability. If your application sends too many requests in a short time, the service returns a ThrottlingException error with HTTP status code 429.

Typical triggering code looks like this:

python

import boto3

client = boto3.client("bedrock-runtime", region_name="us-east-1")

response = client.converse(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    messages=[{"role": "user", "content": [{"type": "text", "text": "Hello"}]}]
)
print(response["output"]["message"]["content"][0]["text"])

output

ThrottlingException: Rate exceeded

The fix

Implement exponential backoff retry logic to handle ThrottlingException. This approach retries the request after increasing delays, reducing request bursts and respecting rate limits.

Example with retries using boto3 and botocore exceptions:

python

import boto3
import time
from botocore.exceptions import ClientError

client = boto3.client("bedrock-runtime", region_name="us-east-1")

max_retries = 5
retry_delay = 1  # seconds

for attempt in range(max_retries):
    try:
        response = client.converse(
            modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
            messages=[{"role": "user", "content": [{"type": "text", "text": "Hello"}]}]
        )
        print(response["output"]["message"]["content"][0]["text"])
        break
    except ClientError as e:
        if e.response['Error']['Code'] == 'ThrottlingException':
            print(f"Rate limit hit, retrying in {retry_delay} seconds...")
            time.sleep(retry_delay)
            retry_delay *= 2  # exponential backoff
        else:
            raise
else:
    print("Max retries exceeded.")

output

Hello
# or
Rate limit hit, retrying in 1 seconds...
Hello

Preventing it in production

Use exponential backoff with jitter to avoid synchronized retries.
Monitor API usage and throttle your client requests proactively.
Implement circuit breakers to fail fast when limits are persistently hit.
Cache frequent responses to reduce unnecessary calls.
Consult AWS Bedrock documentation for specific rate limit quotas per model and region.

Related errors

Error	Cause	Quick fix
ThrottlingException	Exceeded API request rate limit	Add exponential backoff retries
AccessDeniedException	Invalid permissions or credentials	Check IAM policies and credentials
ValidationException	Malformed request parameters	Validate request payload before sending

Key Takeaways

AWS Bedrock rate limits trigger ThrottlingException when exceeded.
Exponential backoff retries prevent request bursts and handle rate limits gracefully.
Proactive monitoring and caching reduce the chance of hitting rate limits.

Verified 2026-04 · anthropic.claude-3-5-sonnet-20241022-v2:0

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.