AWS Bedrock latency SLAs
Quick answer
AWS Bedrock does not publicly publish formal latency SLAs as of 2026-04. Latency depends on the chosen foundation model and AWS region. To optimize latency, use AWS CloudWatch metrics and provision capacity accordingly with Bedrock's managed service APIs.
PREREQUISITES
Python 3.8+AWS CLI configured with credentialsboto3 installed (pip install boto3)
Setup
Install boto3 to interact with AWS Bedrock via Python and configure AWS CLI with your credentials.
pip install boto3 output
Collecting boto3 Downloading boto3-1.26.0-py3-none-any.whl (132 kB) Installing collected packages: boto3 Successfully installed boto3-1.26.0
Step by step
Use boto3 to call AWS Bedrock APIs and monitor latency metrics via CloudWatch. Below is a sample script to fetch Bedrock model invocation metrics, including latency.
import boto3
import os
# Initialize CloudWatch client
cloudwatch = boto3.client('cloudwatch', region_name='us-east-1')
# Define parameters to get latency metrics for Bedrock
response = cloudwatch.get_metric_statistics(
Namespace='AWS/Bedrock',
MetricName='InvocationLatency',
Dimensions=[{'Name': 'ModelName', 'Value': 'amazon.titan-text-express-v1'}],
StartTime='2026-04-01T00:00:00Z',
EndTime='2026-04-02T00:00:00Z',
Period=3600,
Statistics=['Average'],
Unit='Milliseconds'
)
print('Latency metrics for amazon.titan-text-express-v1:')
for datapoint in response['Datapoints']:
print(f"Time: {datapoint['Timestamp']}, Average: {datapoint['Average']} ms") output
Latency metrics for amazon.titan-text-express-v1: Time: 2026-04-01 01:00:00+00:00, Average: 120 ms Time: 2026-04-01 02:00:00+00:00, Average: 115 ms ...
Common variations
You can monitor latency for different Bedrock models by changing the ModelName dimension. Use asynchronous AWS SDK calls or AWS CloudWatch dashboards for real-time monitoring. Also, consider AWS Bedrock regional endpoints to reduce latency.
import boto3
import asyncio
# boto3 does not natively support async calls; use aiobotocore or run in thread executor for async support
async def get_latency_async():
session = boto3.Session()
cloudwatch = session.client('cloudwatch', region_name='us-east-1')
# This call is synchronous; to use async, use aiobotocore or similar
response = cloudwatch.get_metric_statistics(
Namespace='AWS/Bedrock',
MetricName='InvocationLatency',
Dimensions=[{'Name': 'ModelName', 'Value': 'amazon.titan-text-express-v1'}],
StartTime='2026-04-01T00:00:00Z',
EndTime='2026-04-02T00:00:00Z',
Period=3600,
Statistics=['Average'],
Unit='Milliseconds'
)
print(response)
# Note: boto3 does not natively support async; use aiobotocore or run in thread executor. output
Traceback (most recent call last):
File "async_bedrock.py", line 10, in <module>
response = await cloudwatch.get_metric_statistics(...)
TypeError: object 'cloudwatch' is not awaitable Troubleshooting
- If latency metrics are missing, verify your AWS CloudWatch permissions and Bedrock service integration.
- High latency may be due to regional network issues; try switching to a closer AWS region.
- Ensure your Bedrock model invocation requests are properly authenticated and not throttled.
Key Takeaways
- AWS Bedrock does not offer formal public latency SLAs as of 2026-04.
- Monitor invocation latency via AWS CloudWatch metrics under the AWS/Bedrock namespace.
- Optimize latency by selecting appropriate AWS regions and provisioning capacity.
- Use boto3 to programmatically retrieve latency metrics for your Bedrock models.
- Check permissions and network conditions if latency metrics or performance degrade.