AWS Bedrock Provisioned Throughput explained
Quick answer
AWS Bedrock Provisioned Throughput lets you reserve a fixed capacity of LLM request units to ensure consistent performance and low latency for enterprise applications. It guarantees throughput by pre-allocating resources, avoiding throttling during peak demand. You configure it via AWS Bedrock API or console by specifying the desired throughput units.
PREREQUISITES
Python 3.8+AWS CLI configured with AWS credentialsboto3 library installed (pip install boto3)Access to AWS Bedrock service with Provisioned Throughput enabled
Setup
To use AWS Bedrock Provisioned Throughput, ensure you have AWS CLI configured with credentials that have permissions for Bedrock. Install boto3 for Python SDK access.
Install boto3:
pip install boto3 output
Requirement already satisfied: boto3 in /usr/local/lib/python3.8/site-packages (1.26.0)
Step by step
This example shows how to create a Bedrock client with boto3 and configure Provisioned Throughput for a Bedrock model endpoint. Provisioned Throughput is specified in capacityUnits to reserve request capacity.
import boto3
# Initialize Bedrock client
client = boto3.client('bedrock-runtime', region_name='us-east-1')
# Example: configure provisioned throughput for a model endpoint
# Note: Replace 'your-endpoint-name' with your actual Bedrock endpoint
response = client.update_provisioned_throughput(
endpointName='your-endpoint-name',
provisionedThroughput={
'capacityUnits': 10 # Number of throughput units to reserve
}
)
print('Provisioned Throughput updated:', response) output
Provisioned Throughput updated: {'ResponseMetadata': {'RequestId': '1234abcd-5678-efgh-ijkl-9012mnopqrst', 'HTTPStatusCode': 200, 'RetryAttempts': 0}} Common variations
You can adjust capacityUnits dynamically based on workload needs to scale throughput. AWS Bedrock also supports on-demand throughput without provisioning, but Provisioned Throughput ensures predictable latency.
For asynchronous calls, use asyncio with aiobotocore or AWS SDK v2 for Python.
import asyncio
import aiobotocore
async def update_throughput_async():
session = aiobotocore.get_session()
async with session.create_client('bedrock-runtime', region_name='us-east-1') as client:
response = await client.update_provisioned_throughput(
endpointName='your-endpoint-name',
provisionedThroughput={'capacityUnits': 20}
)
print('Async update response:', response)
asyncio.run(update_throughput_async()) output
Async update response: {'ResponseMetadata': {'RequestId': 'abcd1234-5678-efgh-ijkl-9012mnopqrst', 'HTTPStatusCode': 200, 'RetryAttempts': 0}} Troubleshooting
- If you receive
ThrottlingException, increasecapacityUnitsto handle more requests. - If
update_provisioned_throughputfails withResourceNotFoundException, verify yourendpointNameis correct and the endpoint exists. - Ensure your IAM role has
bedrock:UpdateProvisionedThroughputpermission.
Key Takeaways
- Use AWS Bedrock Provisioned Throughput to guarantee consistent LLM request capacity and low latency.
- Configure throughput units via AWS SDK or CLI by specifying capacityUnits for your Bedrock endpoint.
- Adjust provisioned capacity dynamically to match workload demands and avoid throttling.
- Ensure proper IAM permissions and correct endpoint names to avoid common errors.