How to beginner to intermediate · 4 min read

How to create a REST API for ML model

Quick answer
Use a Python web framework like FastAPI to create a REST API that loads your ML model and exposes prediction endpoints. Implement endpoints that accept input data, run inference with the model, and return predictions as JSON responses.

PREREQUISITES

  • Python 3.8+
  • pip install fastapi uvicorn scikit-learn (or your ML framework)
  • Basic knowledge of Python and ML model serialization

Setup

Install FastAPI and uvicorn for the web server, plus your ML framework (e.g., scikit-learn or torch). Save your trained model using serialization (e.g., joblib or pickle).

bash
pip install fastapi uvicorn scikit-learn joblib

Step by step

This example shows a complete FastAPI app that loads a serialized scikit-learn model and exposes a POST endpoint for predictions.

python
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import os

# Define input data schema
class InputData(BaseModel):
    features: list[float]

app = FastAPI()

# Load the ML model at startup
model_path = os.path.join(os.path.dirname(__file__), "model.joblib")
model = joblib.load(model_path)

@app.post("/predict")
def predict(data: InputData):
    # Convert input features to 2D array for sklearn
    features = [data.features]
    prediction = model.predict(features)
    return {"prediction": prediction[0].tolist() if hasattr(prediction[0], 'tolist') else prediction[0]}

# To run: uvicorn main:app --reload

Common variations

  • Use async endpoints for better concurrency.
  • Serve models from other frameworks like PyTorch or TensorFlow.
  • Use Docker to containerize the API for deployment.
  • Implement streaming responses for large outputs.
python
from fastapi import FastAPI
from pydantic import BaseModel
import torch

class InputData(BaseModel):
    features: list[float]

app = FastAPI()

model = torch.load("model.pt")
model.eval()

@app.post("/predict")
async def predict(data: InputData):
    x = torch.tensor([data.features], dtype=torch.float32)
    with torch.no_grad():
        pred = model(x)
    return {"prediction": pred.numpy().tolist()[0]}

Troubleshooting

  • If you get ModuleNotFoundError when loading the model, ensure all dependencies are installed.
  • For uvicorn startup errors, check your Python version and environment variables.
  • Validate input data schema strictly to avoid runtime errors.

Key Takeaways

  • Use FastAPI to quickly build a REST API that serves ML model predictions.
  • Serialize your trained model with joblib or framework-specific tools for loading in the API.
  • Define clear input schemas with Pydantic to validate incoming data.
  • Run the API with uvicorn for local development and containerize for production.
  • Adapt the API to your ML framework and concurrency needs with async endpoints.
Verified 2026-04 · scikit-learn, PyTorch, FastAPI, uvicorn
Verify ↗