How to deploy ML model to AWS
Quick answer
To deploy an ML model to AWS, use
AWS SageMaker for managed training and hosting. Upload your trained model to S3, create a SageMaker model, then deploy it as an endpoint for real-time inference via the boto3 SDK or SageMaker console.PREREQUISITES
Python 3.8+AWS account with IAM permissions for SageMaker and S3AWS CLI configured with credentialspip install boto3 sagemaker
Setup AWS environment
Install required Python packages and configure AWS CLI with your credentials. Ensure your IAM user has permissions for S3 and SageMaker.
pip install boto3 sagemaker output
Requirement already satisfied: boto3 Requirement already satisfied: sagemaker
Step by step deployment
This example uploads a trained model artifact to S3, creates a SageMaker model, and deploys it as a real-time endpoint.
import os
import boto3
from sagemaker import Session
from sagemaker.model import Model
# Initialize clients and session
s3_client = boto3.client('s3')
sagemaker_session = Session()
role = 'arn:aws:iam::123456789012:role/SageMakerExecutionRole' # Replace with your SageMaker role ARN
# Parameters
bucket = 'your-s3-bucket-name' # Replace with your S3 bucket
model_artifact = 'model/model.tar.gz' # Local path to your trained model archive
s3_model_path = f's3://{bucket}/model/model.tar.gz'
# Upload model artifact to S3
s3_client.upload_file(model_artifact, bucket, 'model/model.tar.gz')
print(f'Model uploaded to {s3_model_path}')
# Define a SageMaker model
model = Model(
model_data=s3_model_path,
role=role,
image_uri='763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference:2.12-cpu', # Example TensorFlow inference image
sagemaker_session=sagemaker_session
)
# Deploy the model to an endpoint
predictor = model.deploy(
initial_instance_count=1,
instance_type='ml.m5.large',
endpoint_name='my-ml-model-endpoint'
)
print('Model deployed at endpoint:', predictor.endpoint_name) output
Model uploaded to s3://your-s3-bucket-name/model/model.tar.gz Model deployed at endpoint: my-ml-model-endpoint
Common variations
- Use
asyncdeployment with SageMaker SDK for non-blocking calls. - Deploy different model frameworks by changing
image_uri(PyTorch, XGBoost, etc.). - Use
boto3directly to create endpoints without SageMaker SDK.
import asyncio
from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig
# Example async deployment snippet
async def deploy_async_model():
async_inference_config = AsyncInferenceConfig(output_path=f's3://{bucket}/async-inference-output')
predictor = model.deploy(
initial_instance_count=1,
instance_type='ml.m5.large',
async_inference_config=async_inference_config,
endpoint_name='my-async-endpoint'
)
print('Async model deployed at endpoint:', predictor.endpoint_name)
asyncio.run(deploy_async_model()) output
Async model deployed at endpoint: my-async-endpoint
Troubleshooting common issues
- If you get
AccessDenied, verify your IAM role hasSageMakerFullAccessandS3FullAccesspolicies. - Model deployment fails if the
image_uriis incorrect or region mismatched; confirm the container URI matches your AWS region. - Timeouts during deployment can be resolved by increasing instance count or using smaller models.
Key Takeaways
- Use AWS SageMaker for scalable, managed ML model deployment with minimal infrastructure setup.
- Upload your trained model artifacts to S3 before creating a SageMaker model for deployment.
- Choose the correct inference container image URI based on your model framework and AWS region.
- Deploy models synchronously or asynchronously depending on your application latency needs.
- Ensure IAM roles have proper permissions to avoid common access and deployment errors.