How to beginner · 3 min read

How to use Llama on AWS Bedrock

Quick answer
Use the boto3 client for bedrock-runtime to invoke Llama models on AWS Bedrock by calling converse or invoke_model with the appropriate modelId such as meta.llama3-1-70b-instruct-v1:0. Pass your chat messages in the required JSON format and extract the response text from the API output.

PREREQUISITES

  • Python 3.8+
  • AWS account with Bedrock access
  • AWS CLI configured or AWS credentials set in environment
  • pip install boto3

Setup

Install the boto3 library and configure your AWS credentials with Bedrock permissions. Ensure you have access to the Llama model meta.llama3-1-70b-instruct-v1:0 on AWS Bedrock.

bash
pip install boto3

Step by step

This example shows how to call the Llama model on AWS Bedrock using boto3. It sends a chat message and prints the model's response.

python
import os
import boto3
import json

# Initialize the Bedrock runtime client
client = boto3.client('bedrock-runtime', region_name='us-east-1')

# Define the Llama model ID
model_id = 'meta.llama3-1-70b-instruct-v1:0'

# Prepare the chat messages payload
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Explain the benefits of using AWS Bedrock with Llama models."}
        ]
    }
]

# Call the converse API
response = client.converse(
    modelId=model_id,
    messages=messages
)

# Extract and print the response text
output_message = response.get('output', {}).get('message', {})
if output_message and 'content' in output_message:
    content_list = output_message['content']
    # Find the first text content
    for item in content_list:
        if item.get('type') == 'text':
            print("Llama response:", item.get('text'))
            break
else:
    print("No response content received.")
output
Llama response: AWS Bedrock provides a managed environment to easily deploy and scale Llama models, enabling developers to integrate powerful language models without managing infrastructure.

Common variations

  • Use invoke_model with a JSON string body instead of converse for raw API calls.
  • Change region_name to your AWS region.
  • Adjust messages format for different prompt types or multi-turn conversations.
  • Use AWS IAM roles or environment variables for authentication instead of explicit credentials.
python
import boto3
import json

client = boto3.client('bedrock-runtime', region_name='us-east-1')

model_id = 'meta.llama3-1-70b-instruct-v1:0'

body = json.dumps({
    "max_tokens": 512,
    "messages": [
        {"role": "user", "content": [{"type": "text", "text": "Summarize the latest AI trends."}]}
    ]
})

response = client.invoke_model(
    modelId=model_id,
    body=body
)

output = json.loads(response['body'])
print("Response:", output['results'][0]['outputs'][0]['text'])
output
Response: The latest AI trends include foundation models, generative AI, multimodal learning, and increased focus on AI safety and ethics.

Troubleshooting

  • If you get AccessDeniedException, verify your AWS IAM permissions include Bedrock access.
  • If modelId is invalid, confirm the model name and version in your AWS Bedrock console.
  • For JSON parsing errors, ensure the messages payload matches the expected format with role and content as a list of typed text objects.
  • Check your AWS region matches the Bedrock service availability.

Key Takeaways

  • Use boto3's bedrock-runtime client to call Llama models on AWS Bedrock with the correct modelId.
  • Pass chat messages as JSON with role and typed content arrays for successful requests.
  • Handle AWS credentials and permissions carefully to avoid access errors.
  • Use either converse or invoke_model methods depending on your payload format preference.
Verified 2026-04 · meta.llama3-1-70b-instruct-v1:0
Verify ↗