How to use Prometheus for ML monitoring
Quick answer
Use
Prometheus to monitor ML models by instrumenting your model serving code with Prometheus client libraries to expose metrics endpoints. Then configure Prometheus server to scrape these metrics, enabling real-time tracking of model performance, resource usage, and alerting.PREREQUISITES
Python 3.8+Prometheus installed (server and client libraries)Basic knowledge of ML model servingpip install prometheus-clientAccess to Grafana (optional for visualization)
Setup Prometheus and client library
Install the prometheus-client Python package to expose metrics from your ML model server. Also, install and run the Prometheus server to scrape these metrics.
Set environment variables or config files for Prometheus server to define scrape targets.
pip install prometheus-client output
Collecting prometheus-client Downloading prometheus_client-0.17.0-py3-none-any.whl (58 kB) Installing collected packages: prometheus-client Successfully installed prometheus-client-0.17.0
Step by step: instrument ML model server
Use prometheus_client to create metrics like counters, gauges, and histograms in your ML serving code. Expose a /metrics HTTP endpoint for Prometheus to scrape.
from prometheus_client import start_http_server, Summary, Counter
import random
import time
# Create a metric to track prediction latency
PREDICTION_LATENCY = Summary('ml_prediction_latency_seconds', 'Time spent processing prediction')
# Counter for total predictions
PREDICTIONS_TOTAL = Counter('ml_predictions_total', 'Total number of predictions made')
@PREDICTION_LATENCY.time()
def process_prediction():
# Simulate prediction processing time
time.sleep(random.uniform(0.1, 0.5))
PREDICTIONS_TOTAL.inc()
if __name__ == '__main__':
# Start Prometheus metrics server on port 8000
start_http_server(8000)
print('Prometheus metrics available at http://localhost:8000/metrics')
while True:
process_prediction() output
Prometheus metrics available at http://localhost:8000/metrics # Metrics endpoint serves data like: # ml_prediction_latency_seconds_count 10 # ml_predictions_total 10
Common variations
- Use async frameworks like FastAPI with
prometheus_clientmiddleware for non-blocking metrics. - Integrate with
Grafanafor dashboards visualizing ML metrics. - Configure alerting rules in Prometheus to notify on anomalies like high latency or error rates.
- Use exporters or sidecars if your model server language lacks native Prometheus clients.
Troubleshooting
- If metrics endpoint is not reachable, check firewall and port configuration.
- If Prometheus shows no data, verify scrape configs and endpoint URL.
- High cardinality metrics can cause performance issues; avoid labels with high variability.
- Use
prometheus_clientlogging to debug metric registration errors.
Key Takeaways
- Instrument ML serving code with Prometheus client libraries to expose metrics endpoints.
- Configure Prometheus server to scrape these endpoints regularly for real-time monitoring.
- Use Grafana dashboards and Prometheus alerting rules to visualize and get notified on ML model health.
- Avoid high cardinality labels in metrics to maintain Prometheus performance.
- Test metrics endpoints independently to ensure Prometheus can scrape them successfully.