How to serve a machine learning model in production
Quick answer
To serve a machine learning model in production, deploy it behind an API endpoint using frameworks like
FastAPI or Flask, containerize with Docker, and use a model server or cloud service for scalability. This setup enables real-time inference by exposing the model as a REST or gRPC service accessible to client applications.PREREQUISITES
Python 3.8+pip install fastapi uvicorn scikit-learnBasic knowledge of REST APIsDocker installed (optional but recommended)
Setup environment
Install necessary Python packages and set up environment variables for secure deployment.
pip install fastapi uvicorn scikit-learn output
Collecting fastapi\nCollecting uvicorn\nCollecting scikit-learn\nSuccessfully installed fastapi uvicorn scikit-learn
Step by step serving code
This example shows how to serve a trained scikit-learn model with FastAPI for real-time predictions.
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
# Define input data schema
class InputData(BaseModel):
feature1: float
feature2: float
# Load pre-trained model
model = joblib.load('model.joblib')
app = FastAPI()
@app.post('/predict')
def predict(data: InputData):
features = [[data.feature1, data.feature2]]
prediction = model.predict(features)
return {'prediction': prediction[0].tolist()}
# To run: uvicorn main:app --host 0.0.0.0 --port 8000 output
INFO: Started server process [12345]\nINFO: Waiting for application startup.\nINFO: Application startup complete.\nINFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Common variations
- Use
Dockerto containerize the app for consistent deployment. - Switch to
gpt-4oorclaude-3-5-haiku-20241022for serving LLMs via API. - Implement async endpoints with
FastAPIfor higher throughput. - Use cloud services like AWS SageMaker, Google Vertex AI, or Azure ML for managed serving.
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . ./
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] Troubleshooting tips
- If the API returns 500 errors, check model file path and dependencies.
- For slow responses, enable async endpoints or increase worker count in
uvicorn. - Use logging to capture input/output for debugging.
- Ensure environment variables like
API_KEYare set securely in production.
Key Takeaways
- Use lightweight web frameworks like
FastAPIto expose ML models as REST APIs. - Containerize your model server with
Dockerfor portability and scalability. - Leverage async programming and multiple workers to handle concurrent requests efficiently.
- Managed cloud services simplify production serving but require understanding of deployment pipelines.