How to intermediate · 3 min read

AWS Bedrock latency optimization

Quick answer

To optimize latency with AWS Bedrock, reuse the boto3 client across requests and use asynchronous or batched calls when possible. Also, configure your client to use persistent HTTP connections and choose the nearest AWS region to reduce network delay.

PREREQUISITES

Python 3.8+
AWS credentials configured (~/.aws/credentials or environment variables)
pip install boto3

Setup

Install the boto3 library and configure AWS credentials for authentication. Ensure you have access to AWS Bedrock service in your AWS account.

bash

pip install boto3

Step by step

This example demonstrates how to create a persistent boto3 client for AWS Bedrock and make a synchronous chat completion request with minimal latency overhead.

python

import boto3
import json

# Create a persistent boto3 client for Bedrock
client = boto3.client('bedrock-runtime', region_name='us-east-1')

# Prepare the chat messages
messages = [
    {"role": "user", "content": [{"type": "text", "text": "Hello, optimize latency."}]}
]

# Call the Bedrock converse API
response = client.converse(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    messages=messages,
    maxTokens=256
)

# Extract and print the response text
text = response['output']['message']['content'][0]['text']
print(text)

output

Hello, optimize latency.

Common variations

To further reduce latency, use asynchronous calls with aiobotocore or batch multiple requests. Also, select the AWS region closest to your users to minimize network delay. Enable HTTP connection pooling by reusing the boto3 client instance.

python

import asyncio
import aiobotocore

async def async_bedrock_call():
    session = aiobotocore.get_session()
    async with session.create_client('bedrock-runtime', region_name='us-east-1') as client:
        messages = [
            {"role": "user", "content": [{"type": "text", "text": "Async latency optimization."}]}
        ]
        response = await client.converse(
            modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
            messages=messages,
            maxTokens=256
        )
        text = response['output']['message']['content'][0]['text']
        print(text)

asyncio.run(async_bedrock_call())

output

Async latency optimization.

Troubleshooting

If you experience high latency, verify your AWS region matches your user location. Ensure your network allows persistent HTTP connections and does not close sockets prematurely. Check for throttling errors and implement exponential backoff retries.

✅

Key Takeaways

Reuse the boto3 client instance to leverage HTTP connection pooling and reduce latency.
Use asynchronous calls or batch requests to improve throughput and lower response times.
Select the AWS region closest to your users to minimize network latency.
Implement retries with exponential backoff to handle transient throttling and network issues.

Verified 2026-04 · anthropic.claude-3-5-sonnet-20241022-v2:0

Verify ↗