How to Intermediate · 4 min read

How to deploy ML model to AWS

Q: How to deploy ML model to AWS

To deploy an ML model to AWS, use AWS SageMaker for managed training and hosting. Upload your trained model to S3, create a SageMaker model, then deploy it as an endpoint for real-time inference via the boto3 SDK or SageMaker console.

Quick answer

To deploy an ML model to AWS, use AWS SageMaker for managed training and hosting. Upload your trained model to S3, create a SageMaker model, then deploy it as an endpoint for real-time inference via the boto3 SDK or SageMaker console.

PREREQUISITES

Python 3.8+
AWS account with IAM permissions for SageMaker and S3
AWS CLI configured with credentials
pip install boto3 sagemaker

Setup AWS environment

Install required Python packages and configure AWS CLI with your credentials. Ensure your IAM user has permissions for S3 and SageMaker.

bash

pip install boto3 sagemaker

output

Requirement already satisfied: boto3
Requirement already satisfied: sagemaker

Step by step deployment

This example uploads a trained model artifact to S3, creates a SageMaker model, and deploys it as a real-time endpoint.

python

import os
import boto3
from sagemaker import Session
from sagemaker.model import Model

# Initialize clients and session
s3_client = boto3.client('s3')
sagemaker_session = Session()
role = 'arn:aws:iam::123456789012:role/SageMakerExecutionRole'  # Replace with your SageMaker role ARN

# Parameters
bucket = 'your-s3-bucket-name'  # Replace with your S3 bucket
model_artifact = 'model/model.tar.gz'  # Local path to your trained model archive
s3_model_path = f's3://{bucket}/model/model.tar.gz'

# Upload model artifact to S3
s3_client.upload_file(model_artifact, bucket, 'model/model.tar.gz')
print(f'Model uploaded to {s3_model_path}')

# Define a SageMaker model
model = Model(
    model_data=s3_model_path,
    role=role,
    image_uri='763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference:2.12-cpu',  # Example TensorFlow inference image
    sagemaker_session=sagemaker_session
)

# Deploy the model to an endpoint
predictor = model.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.large',
    endpoint_name='my-ml-model-endpoint'
)

print('Model deployed at endpoint:', predictor.endpoint_name)

output

Model uploaded to s3://your-s3-bucket-name/model/model.tar.gz
Model deployed at endpoint: my-ml-model-endpoint

Common variations

Use async deployment with SageMaker SDK for non-blocking calls.
Deploy different model frameworks by changing image_uri (PyTorch, XGBoost, etc.).
Use boto3 directly to create endpoints without SageMaker SDK.

python

import asyncio
from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig

# Example async deployment snippet
async def deploy_async_model():
    async_inference_config = AsyncInferenceConfig(output_path=f's3://{bucket}/async-inference-output')
    predictor = model.deploy(
        initial_instance_count=1,
        instance_type='ml.m5.large',
        async_inference_config=async_inference_config,
        endpoint_name='my-async-endpoint'
    )
    print('Async model deployed at endpoint:', predictor.endpoint_name)

asyncio.run(deploy_async_model())

output

Async model deployed at endpoint: my-async-endpoint

Troubleshooting common issues

If you get AccessDenied, verify your IAM role has SageMakerFullAccess and S3FullAccess policies.
Model deployment fails if the image_uri is incorrect or region mismatched; confirm the container URI matches your AWS region.
Timeouts during deployment can be resolved by increasing instance count or using smaller models.

Key Takeaways

Use AWS SageMaker for scalable, managed ML model deployment with minimal infrastructure setup.
Upload your trained model artifacts to S3 before creating a SageMaker model for deployment.
Choose the correct inference container image URI based on your model framework and AWS region.
Deploy models synchronously or asynchronously depending on your application latency needs.
Ensure IAM roles have proper permissions to avoid common access and deployment errors.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.