How to intermediate · 3 min read

How to monitor Vertex AI model

Quick answer

Use the google-cloud-monitoring Python client to monitor your Vertex AI model by querying metrics like prediction latency and error rates. Set up alerting policies and dashboards in Google Cloud Monitoring to track model performance and health in real time.

PREREQUISITES

Python 3.8+
Google Cloud project with Vertex AI enabled
Service account with Monitoring Viewer and Vertex AI User roles
pip install google-cloud-monitoring google-auth

Setup

Install the required Google Cloud Monitoring client library and authenticate with a service account that has permissions to access Vertex AI metrics.

bash

pip install google-cloud-monitoring google-auth

Step by step

This example shows how to query Vertex AI model prediction latency metrics using the Google Cloud Monitoring API in Python.

python

import os
from google.cloud import monitoring_v3
from google.oauth2 import service_account

# Set path to your service account key JSON
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/service-account.json"

project_id = "your-gcp-project-id"
client = monitoring_v3.MetricServiceClient()

project_name = f"projects/{project_id}"

# Query Vertex AI prediction latency metric
metric_type = "aiplatform.googleapis.com/prediction/latency"

interval = monitoring_v3.TimeInterval()
interval.end_time.seconds = int(monitoring_v3.Timestamp().GetCurrentTime().seconds)
interval.start_time.seconds = interval.end_time.seconds - 3600  # last hour

aggregation = monitoring_v3.Aggregation(
    alignment_period={'seconds': 60},
    per_series_aligner=monitoring_v3.Aggregation.Aligner.ALIGN_MEAN,
)

results = client.list_time_series(
    request={
        "name": project_name,
        "filter": f"metric.type = \"{metric_type}\"",
        "interval": interval,
        "view": monitoring_v3.ListTimeSeriesRequest.TimeSeriesView.FULL,
        "aggregation": aggregation,
    }
)

for time_series in results:
    print(f"Metric labels: {time_series.metric.labels}")
    for point in time_series.points:
        print(f"Time: {point.interval.end_time.ToDatetime()}, Latency (ms): {point.value.double_value}")

output

Metric labels: {'model_id': '1234567890', 'endpoint_id': '0987654321'}
Time: 2026-04-26 15:00:00+00:00, Latency (ms): 120.5
Time: 2026-04-26 15:01:00+00:00, Latency (ms): 115.3
...

Common variations

Use filter to query other metrics like aiplatform.googleapis.com/prediction/error_count for error monitoring. Set up alerting policies in Google Cloud Console to notify on threshold breaches. Use Google Cloud Logging to collect detailed prediction logs for troubleshooting.

Troubleshooting

If you get permission errors, verify your service account has roles/monitoring.viewer and roles/aiplatform.user. If no metrics appear, confirm your Vertex AI model is deployed and receiving traffic. Check that the metric.type matches the exact Vertex AI metric names.

✅

Key Takeaways

Use the Google Cloud Monitoring API to programmatically access Vertex AI model metrics.
Query metrics like prediction latency and error count to monitor model health.
Set up alerting policies in Google Cloud Console for proactive notifications.
Ensure proper IAM roles for your service account to access monitoring data.

Verified 2026-04 · gemini-2.5-pro, gpt-4o, claude-3-5-sonnet-20241022

Verify ↗