High severity HTTP 400 intermediate · Fix: 5-10 min

ThrottlingException

botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the InvokeEndpoint operation: Rate exceeded

What this error means
AWS Bedrock API returns a ThrottlingException when the request rate exceeds the allowed limit, blocking further calls temporarily.

Stack trace

traceback
botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the InvokeEndpoint operation: Rate exceeded
  File "/app/main.py", line 42, in invoke_bedrock
    response = client.invoke_endpoint(
  File "/usr/local/lib/python3.9/site-packages/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.9/site-packages/botocore/client.py", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)
QUICK FIX
Add exponential backoff retries with jitter around your Bedrock API calls to automatically handle throttling exceptions.

Why it happens

AWS Bedrock enforces strict API rate limits to protect service stability. When your application sends requests faster than the allowed threshold, AWS returns a ThrottlingException to signal you must slow down.

Detection

Monitor API call metrics and catch botocore.exceptions.ClientError with error code 'ThrottlingException' to detect rate limit breaches before full failure.

Causes & fixes

1

Too many concurrent or rapid requests to the Bedrock InvokeEndpoint API exceeding AWS rate limits

✓ Fix

Implement exponential backoff and retry logic with jitter to reduce request rate and avoid hitting the throttling threshold.

2

Lack of request rate limiting or batching in client application causing burst traffic

✓ Fix

Add client-side rate limiting or batch requests to smooth traffic and stay within allowed API call rates.

3

Using a low-tier AWS Bedrock service quota without requesting quota increase

✓ Fix

Request a service quota increase from AWS Support to raise your allowed request rate limits.

Code: broken vs fixed

Broken - triggers the error
python
import boto3

client = boto3.client('bedrock')

response = client.invoke_endpoint(
    EndpointName='my-endpoint',
    Body=b'{}'
)  # This line triggers ThrottlingException when rate limit exceeded
print(response)
Fixed - works correctly
python
import os
import time
import random
import boto3
from botocore.exceptions import ClientError

client = boto3.client('bedrock',
                      aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
                      aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'],
                      region_name=os.environ.get('AWS_REGION', 'us-east-1'))

def invoke_with_retry(payload, max_retries=5):
    for attempt in range(max_retries):
        try:
            response = client.invoke_endpoint(
                EndpointName=os.environ['BEDROCK_ENDPOINT_NAME'],
                Body=payload
            )
            return response
        except ClientError as e:
            if e.response['Error']['Code'] == 'ThrottlingException':
                sleep_time = (2 ** attempt) + random.uniform(0, 1)
                print(f"Throttled, retrying in {sleep_time:.2f} seconds...")
                time.sleep(sleep_time)
            else:
                raise
    raise Exception("Max retries exceeded due to throttling")

response = invoke_with_retry(b'{}')  # Fixed: added retry with exponential backoff
print(response)
Added exponential backoff retry with jitter to handle AWS Bedrock ThrottlingException and prevent immediate failure on rate limit exceeded.

Workaround

Catch the ThrottlingException and implement a fixed delay retry loop manually to reduce request frequency temporarily until the rate limit resets.

Prevention

Architect your application to use client-side rate limiting, batching, and exponential backoff retries, and request AWS quota increases proactively to avoid throttling.

Python 3.9+ · boto3 >=1.26.0 · tested on 1.28.x
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.