How to beginner · 3 min read

How to use Flask to serve ML model

Q: How to use Flask to serve ML model

Use Flask to create a lightweight web server that exposes your ML model as an API endpoint. Load your trained model in the Flask app, then handle incoming requests by running inference and returning predictions as JSON.

Quick answer

Use Flask to create a lightweight web server that exposes your ML model as an API endpoint. Load your trained model in the Flask app, then handle incoming requests by running inference and returning predictions as JSON.

PREREQUISITES

Python 3.8+
pip install flask
A trained ML model saved (e.g., pickle, joblib, or torch model file)

Setup

Install Flask via pip and prepare your ML model file for loading. Ensure your environment has Python 3.8 or newer.

bash

pip install flask

Step by step

This example shows how to serve a scikit-learn model using Flask. The model is loaded once at startup, and the API accepts JSON input to return predictions.

python

from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)

# Load your trained ML model (replace 'model.pkl' with your model file)
model = joblib.load('model.pkl')

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json(force=True)
    # Assume input is a list of feature lists
    features = data.get('features')
    if not features:
        return jsonify({'error': 'No features provided'}), 400
    # Run prediction
    preds = model.predict(features)
    # Convert predictions to list for JSON serialization
    return jsonify({'predictions': preds.tolist()})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000, debug=True)

output

 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!

Common variations

Use async Flask routes with Quart for async support.
Serve models from other frameworks like TensorFlow or PyTorch by loading their respective model files.
Use environment variables or config files to manage model paths and server settings.
Implement streaming responses for large outputs or batch predictions.

Troubleshooting

If you get ModuleNotFoundError, ensure Flask is installed in your environment.
For JSON parsing errors, verify the client sends valid JSON with the correct Content-Type: application/json header.
If predictions fail, check the input shape matches what the model expects.
Use debug=True in development to get detailed error messages.

✅

Key Takeaways

Load your ML model once at Flask app startup to optimize performance.
Expose a POST endpoint that accepts JSON input and returns JSON predictions.
Validate input data format to avoid runtime errors during inference.

Verified 2026-04

Verify ↗