How to Intermediate · 4 min read

How to serve a machine learning model in production

Quick answer

To serve a machine learning model in production, deploy it behind an API endpoint using frameworks like FastAPI or Flask, containerize with Docker, and use a model server or cloud service for scalability. This setup enables real-time inference by exposing the model as a REST or gRPC service accessible to client applications.

PREREQUISITES

Python 3.8+
pip install fastapi uvicorn scikit-learn
Basic knowledge of REST APIs
Docker installed (optional but recommended)

Setup environment

Install necessary Python packages and set up environment variables for secure deployment.

bash

pip install fastapi uvicorn scikit-learn

output

Collecting fastapi\nCollecting uvicorn\nCollecting scikit-learn\nSuccessfully installed fastapi uvicorn scikit-learn

Step by step serving code

This example shows how to serve a trained scikit-learn model with FastAPI for real-time predictions.

python

from fastapi import FastAPI
from pydantic import BaseModel
import joblib

# Define input data schema
class InputData(BaseModel):
    feature1: float
    feature2: float

# Load pre-trained model
model = joblib.load('model.joblib')

app = FastAPI()

@app.post('/predict')
def predict(data: InputData):
    features = [[data.feature1, data.feature2]]
    prediction = model.predict(features)
    return {'prediction': prediction[0].tolist()}

# To run: uvicorn main:app --host 0.0.0.0 --port 8000

output

INFO:     Started server process [12345]\nINFO:     Waiting for application startup.\nINFO:     Application startup complete.\nINFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Common variations

Use Docker to containerize the app for consistent deployment.
Switch to gpt-4o or claude-3-5-haiku-20241022 for serving LLMs via API.
Implement async endpoints with FastAPI for higher throughput.
Use cloud services like AWS SageMaker, Google Vertex AI, or Azure ML for managed serving.

python

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . ./
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Troubleshooting tips

If the API returns 500 errors, check model file path and dependencies.
For slow responses, enable async endpoints or increase worker count in uvicorn.
Use logging to capture input/output for debugging.
Ensure environment variables like API_KEY are set securely in production.

✅

Key Takeaways

Use lightweight web frameworks like FastAPI to expose ML models as REST APIs.
Containerize your model server with Docker for portability and scalability.
Leverage async programming and multiple workers to handle concurrent requests efficiently.
Managed cloud services simplify production serving but require understanding of deployment pipelines.

Verified 2026-04 · gpt-4o, claude-3-5-haiku-20241022

Verify ↗