How to call AWS Bedrock API in Python
boto3 client for bedrock-runtime to call AWS Bedrock API in Python by invoking the converse or invoke_model methods with the appropriate model ID and message format.Setup
pip install boto3 AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_DEFAULT_REGION import boto3
import json
import os Examples
Integration steps
- Set up AWS credentials in environment variables or AWS config files.
- Import boto3 and create a client for 'bedrock-runtime' with the correct AWS region.
- Prepare the message payload in the required JSON format for the model.
- Call the 'converse' or 'invoke_model' method with the model ID and message body.
- Parse the JSON response to extract the generated text from the model.
- Use or display the extracted text as needed in your application.
Full code
import boto3
import json
import os
# Initialize the Bedrock client
client = boto3.client('bedrock-runtime', region_name=os.environ.get('AWS_DEFAULT_REGION', 'us-east-1'))
# Define the model ID (example: Anthropic Claude 3.5 Sonnet)
model_id = 'anthropic.claude-3-5-sonnet-20241022-v2:0'
# Prepare the messages payload
messages = [
{"role": "user", "content": [{"type": "text", "text": "Hello, who won the 2024 Olympics?"}]}
]
# Call the converse API
response = client.converse(
modelId=model_id,
messages=messages
)
# Extract the text from the response
output_message = response.get('output', {}).get('message', {})
if output_message and 'content' in output_message and len(output_message['content']) > 0:
text = output_message['content'][0].get('text', '')
else:
text = "No response received."
print("Model response:", text) Model response: The 2024 Summer Olympics were held in Paris, and the United States topped the medal count.
API trace
{"modelId": "anthropic.claude-3-5-sonnet-20241022-v2:0", "messages": [{"role": "user", "content": [{"type": "text", "text": "Hello, who won the 2024 Olympics?"}]}]} {"output": {"message": {"content": [{"type": "text", "text": "The 2024 Summer Olympics were held in Paris, and the United States topped the medal count."}]}}} response['output']['message']['content'][0]['text']Variants
Streaming response with AWS Bedrock ›
Use when you want to handle responses incrementally if AWS Bedrock adds streaming support in the future.
import boto3
import os
client = boto3.client('bedrock-runtime', region_name=os.environ.get('AWS_DEFAULT_REGION', 'us-east-1'))
model_id = 'anthropic.claude-3-5-sonnet-20241022-v2:0'
messages = [{"role": "user", "content": [{"type": "text", "text": "Explain quantum computing."}]}]
# Note: As of now, AWS Bedrock does not support streaming responses directly via boto3.
# This is a placeholder for future streaming support.
response = client.converse(modelId=model_id, messages=messages)
text = response['output']['message']['content'][0]['text']
print("Streaming (simulated) response:", text) Async call using aiobotocore ›
Use for asynchronous applications where non-blocking calls to AWS Bedrock improve throughput.
import asyncio
import aiobotocore
import os
async def async_bedrock_call():
session = aiobotocore.get_session()
async with session.create_client('bedrock-runtime', region_name=os.environ.get('AWS_DEFAULT_REGION', 'us-east-1')) as client:
model_id = 'anthropic.claude-3-5-sonnet-20241022-v2:0'
messages = [{"role": "user", "content": [{"type": "text", "text": "Summarize the latest AI trends."}]}]
response = await client.converse(modelId=model_id, messages=messages)
text = response['output']['message']['content'][0]['text']
print("Async response:", text)
asyncio.run(async_bedrock_call()) Invoke alternative model (Llama 3.1) ›
Use when you want to leverage a different model available on AWS Bedrock for varied capabilities or cost.
import boto3
import os
client = boto3.client('bedrock-runtime', region_name=os.environ.get('AWS_DEFAULT_REGION', 'us-east-1'))
model_id = 'meta.llama3-1-70b-instruct-v1:0'
messages = [{"role": "user", "content": [{"type": "text", "text": "Write a poem about spring."}]}]
response = client.converse(modelId=model_id, messages=messages)
text = response['output']['message']['content'][0]['text']
print("Llama model response:", text) Performance
- Limit max_tokens to control cost and latency.
- Use concise prompts to reduce token usage.
- Cache frequent queries to avoid repeated calls.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Synchronous boto3 call | ~1-2s | ~$0.005 | Simple scripts and batch jobs |
| Async aiobotocore call | ~1-2s (non-blocking) | ~$0.005 | High concurrency apps |
| Alternative model invocation | ~1-3s | ~$0.003-$0.01 | Use different model capabilities or cost optimization |
Quick tip
Always format your messages as a list of role-content dicts with text typed content to comply with AWS Bedrock's expected input schema.
Common mistake
Passing messages as plain strings or missing the nested content array with type and text fields causes API errors.