How to use BentoML for model serving
Quick answer
Use
BentoML to package your ML model into a service by defining a BentoService class, then serve it locally or deploy it to production with built-in APIs. BentoML simplifies model serving by handling REST API creation, containerization, and scaling.PREREQUISITES
Python 3.8+pip install bentoml>=1.0A trained ML model (e.g., scikit-learn, PyTorch, TensorFlow)Basic knowledge of Python and REST APIs
Setup
Install BentoML via pip and prepare your environment variables if deploying to cloud or container platforms.
pip install bentoml Step by step
Define a BentoService class to wrap your model, implement a prediction API, save the service, and serve it locally.
import bentoml
from bentoml.io import JSON
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
# Train a simple model
iris = load_iris()
model = RandomForestClassifier()
model.fit(iris.data, iris.target)
# Define BentoService
@bentoml.env(infer_pip_packages=True)
@bentoml.artifacts([bentoml.artifacts.PickleArtifact('model')])
class IrisClassifier(bentoml.BentoService):
@bentoml.api(input=JSON(), output=JSON())
def predict(self, parsed_json):
data = parsed_json["data"]
prediction = self.artifacts.model.predict(data)
return {"prediction": prediction.tolist()}
# Create service instance and pack model
svc = IrisClassifier()
svc.pack('model', model)
# Save the service
saved_path = svc.save()
print(f"Service saved to: {saved_path}")
# Serve the model locally
# Run this in terminal: bentoml serve IrisClassifier:latest output
Service saved to: IrisClassifier/20260401_123456_ABCD1234
Common variations
- Use
bentoml serveCLI to serve the model locally with REST API. - Deploy to cloud platforms using
bentoml containerizeto build Docker images. - Support async APIs by defining async methods in
BentoService. - Use different input/output adapters like
Image,Text, orDataframefor varied data types.
import bentoml
from bentoml.io import JSON
class AsyncService(bentoml.BentoService):
@bentoml.api(input=JSON(), output=JSON())
async def predict(self, parsed_json):
# async prediction logic here
return {"result": "async response"} Troubleshooting
- If
bentoml servefails, check if the service is saved correctly and the model artifact is packed. - For dependency issues, use
@bentoml.env(infer_pip_packages=True)to auto-detect packages. - When deploying Docker images, ensure Docker daemon is running and you have permission.
Key Takeaways
- Use
BentoServiceto wrap and serve ML models with REST APIs effortlessly. -
BentoMLautomates packaging, dependency management, and deployment workflows. - Local serving and containerization are built-in, enabling easy transition from dev to production.
- Support for async APIs and multiple input/output adapters makes BentoML flexible for various use cases.