ThrottlingException
botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the InvokeEndpoint operation: Rate exceeded
Stack trace
botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the InvokeEndpoint operation: Rate exceeded
File "/app/main.py", line 42, in invoke_bedrock
response = client.invoke_endpoint(
File "/usr/local/lib/python3.9/site-packages/botocore/client.py", line 530, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.9/site-packages/botocore/client.py", line 960, in _make_api_call
raise error_class(parsed_response, operation_name) Why it happens
AWS Bedrock enforces strict API rate limits to protect service stability. When your application sends requests faster than the allowed threshold, AWS returns a ThrottlingException to signal you must slow down.
Detection
Monitor API call metrics and catch botocore.exceptions.ClientError with error code 'ThrottlingException' to detect rate limit breaches before full failure.
Causes & fixes
Too many concurrent or rapid requests to the Bedrock InvokeEndpoint API exceeding AWS rate limits
Implement exponential backoff and retry logic with jitter to reduce request rate and avoid hitting the throttling threshold.
Lack of request rate limiting or batching in client application causing burst traffic
Add client-side rate limiting or batch requests to smooth traffic and stay within allowed API call rates.
Using a low-tier AWS Bedrock service quota without requesting quota increase
Request a service quota increase from AWS Support to raise your allowed request rate limits.
Code: broken vs fixed
import boto3
client = boto3.client('bedrock')
response = client.invoke_endpoint(
EndpointName='my-endpoint',
Body=b'{}'
) # This line triggers ThrottlingException when rate limit exceeded
print(response) import os
import time
import random
import boto3
from botocore.exceptions import ClientError
client = boto3.client('bedrock',
aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'],
region_name=os.environ.get('AWS_REGION', 'us-east-1'))
def invoke_with_retry(payload, max_retries=5):
for attempt in range(max_retries):
try:
response = client.invoke_endpoint(
EndpointName=os.environ['BEDROCK_ENDPOINT_NAME'],
Body=payload
)
return response
except ClientError as e:
if e.response['Error']['Code'] == 'ThrottlingException':
sleep_time = (2 ** attempt) + random.uniform(0, 1)
print(f"Throttled, retrying in {sleep_time:.2f} seconds...")
time.sleep(sleep_time)
else:
raise
raise Exception("Max retries exceeded due to throttling")
response = invoke_with_retry(b'{}') # Fixed: added retry with exponential backoff
print(response) Workaround
Catch the ThrottlingException and implement a fixed delay retry loop manually to reduce request frequency temporarily until the rate limit resets.
Prevention
Architect your application to use client-side rate limiting, batching, and exponential backoff retries, and request AWS quota increases proactively to avoid throttling.