How to beginner to intermediate · 4 min read

How to create a REST API for ML model

Q: How to create a REST API for ML model

Use a Python web framework like FastAPI to create a REST API that loads your ML model and exposes prediction endpoints. Implement endpoints that accept input data, run inference with the model, and return predictions as JSON responses.

Quick answer

Use a Python web framework like FastAPI to create a REST API that loads your ML model and exposes prediction endpoints. Implement endpoints that accept input data, run inference with the model, and return predictions as JSON responses.

PREREQUISITES

Python 3.8+
pip install fastapi uvicorn scikit-learn (or your ML framework)
Basic knowledge of Python and ML model serialization

Setup

Install FastAPI and uvicorn for the web server, plus your ML framework (e.g., scikit-learn or torch). Save your trained model using serialization (e.g., joblib or pickle).

bash

pip install fastapi uvicorn scikit-learn joblib

Step by step

This example shows a complete FastAPI app that loads a serialized scikit-learn model and exposes a POST endpoint for predictions.

python

from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import os

# Define input data schema
class InputData(BaseModel):
    features: list[float]

app = FastAPI()

# Load the ML model at startup
model_path = os.path.join(os.path.dirname(__file__), "model.joblib")
model = joblib.load(model_path)

@app.post("/predict")
def predict(data: InputData):
    # Convert input features to 2D array for sklearn
    features = [data.features]
    prediction = model.predict(features)
    return {"prediction": prediction[0].tolist() if hasattr(prediction[0], 'tolist') else prediction[0]}

# To run: uvicorn main:app --reload

Common variations

Use async endpoints for better concurrency.
Serve models from other frameworks like PyTorch or TensorFlow.
Use Docker to containerize the API for deployment.
Implement streaming responses for large outputs.

python

from fastapi import FastAPI
from pydantic import BaseModel
import torch

class InputData(BaseModel):
    features: list[float]

app = FastAPI()

model = torch.load("model.pt")
model.eval()

@app.post("/predict")
async def predict(data: InputData):
    x = torch.tensor([data.features], dtype=torch.float32)
    with torch.no_grad():
        pred = model(x)
    return {"prediction": pred.numpy().tolist()[0]}

Troubleshooting

If you get ModuleNotFoundError when loading the model, ensure all dependencies are installed.
For uvicorn startup errors, check your Python version and environment variables.
Validate input data schema strictly to avoid runtime errors.

Key Takeaways

Use FastAPI to quickly build a REST API that serves ML model predictions.
Serialize your trained model with joblib or framework-specific tools for loading in the API.
Define clear input schemas with Pydantic to validate incoming data.
Run the API with uvicorn for local development and containerize for production.
Adapt the API to your ML framework and concurrency needs with async endpoints.

Verified 2026-04 · scikit-learn, PyTorch, FastAPI, uvicorn

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.