How to intermediate · 3 min read

AWS Bedrock latency SLAs

Quick answer
AWS Bedrock does not publicly publish formal latency SLAs as of 2026-04. Latency depends on the chosen foundation model and AWS region. To optimize latency, use AWS CloudWatch metrics and provision capacity accordingly with Bedrock's managed service APIs.

PREREQUISITES

  • Python 3.8+
  • AWS CLI configured with credentials
  • boto3 installed (pip install boto3)

Setup

Install boto3 to interact with AWS Bedrock via Python and configure AWS CLI with your credentials.

bash
pip install boto3
output
Collecting boto3
  Downloading boto3-1.26.0-py3-none-any.whl (132 kB)
Installing collected packages: boto3
Successfully installed boto3-1.26.0

Step by step

Use boto3 to call AWS Bedrock APIs and monitor latency metrics via CloudWatch. Below is a sample script to fetch Bedrock model invocation metrics, including latency.

python
import boto3
import os

# Initialize CloudWatch client
cloudwatch = boto3.client('cloudwatch', region_name='us-east-1')

# Define parameters to get latency metrics for Bedrock
response = cloudwatch.get_metric_statistics(
    Namespace='AWS/Bedrock',
    MetricName='InvocationLatency',
    Dimensions=[{'Name': 'ModelName', 'Value': 'amazon.titan-text-express-v1'}],
    StartTime='2026-04-01T00:00:00Z',
    EndTime='2026-04-02T00:00:00Z',
    Period=3600,
    Statistics=['Average'],
    Unit='Milliseconds'
)

print('Latency metrics for amazon.titan-text-express-v1:')
for datapoint in response['Datapoints']:
    print(f"Time: {datapoint['Timestamp']}, Average: {datapoint['Average']} ms")
output
Latency metrics for amazon.titan-text-express-v1:
Time: 2026-04-01 01:00:00+00:00, Average: 120 ms
Time: 2026-04-01 02:00:00+00:00, Average: 115 ms
...

Common variations

You can monitor latency for different Bedrock models by changing the ModelName dimension. Use asynchronous AWS SDK calls or AWS CloudWatch dashboards for real-time monitoring. Also, consider AWS Bedrock regional endpoints to reduce latency.

python
import boto3
import asyncio

# boto3 does not natively support async calls; use aiobotocore or run in thread executor for async support

async def get_latency_async():
    session = boto3.Session()
    cloudwatch = session.client('cloudwatch', region_name='us-east-1')
    # This call is synchronous; to use async, use aiobotocore or similar
    response = cloudwatch.get_metric_statistics(
        Namespace='AWS/Bedrock',
        MetricName='InvocationLatency',
        Dimensions=[{'Name': 'ModelName', 'Value': 'amazon.titan-text-express-v1'}],
        StartTime='2026-04-01T00:00:00Z',
        EndTime='2026-04-02T00:00:00Z',
        Period=3600,
        Statistics=['Average'],
        Unit='Milliseconds'
    )
    print(response)

# Note: boto3 does not natively support async; use aiobotocore or run in thread executor.
output
Traceback (most recent call last):
  File "async_bedrock.py", line 10, in <module>
    response = await cloudwatch.get_metric_statistics(...)
TypeError: object 'cloudwatch' is not awaitable

Troubleshooting

  • If latency metrics are missing, verify your AWS CloudWatch permissions and Bedrock service integration.
  • High latency may be due to regional network issues; try switching to a closer AWS region.
  • Ensure your Bedrock model invocation requests are properly authenticated and not throttled.

Key Takeaways

  • AWS Bedrock does not offer formal public latency SLAs as of 2026-04.
  • Monitor invocation latency via AWS CloudWatch metrics under the AWS/Bedrock namespace.
  • Optimize latency by selecting appropriate AWS regions and provisioning capacity.
  • Use boto3 to programmatically retrieve latency metrics for your Bedrock models.
  • Check permissions and network conditions if latency metrics or performance degrade.
Verified 2026-04 · amazon.titan-text-express-v1
Verify ↗