How to beginner · 3 min read

How to use Vertex AI online prediction

Q: How to use Vertex AI online prediction

Use the vertexai Python SDK to perform online prediction by initializing the client with your Google Cloud project and location, then loading your deployed model and calling predict() with input data. This enables real-time inference on your Vertex AI deployed models.

Quick answer

Use the vertexai Python SDK to perform online prediction by initializing the client with your Google Cloud project and location, then loading your deployed model and calling predict() with input data. This enables real-time inference on your Vertex AI deployed models.

PREREQUISITES

Python 3.8+
Google Cloud project with Vertex AI enabled
Service account with Vertex AI permissions
Google Cloud SDK installed and authenticated
pip install vertexai google-cloud-aiplatform

Setup

Install the required Python packages and authenticate your Google Cloud environment.

Install the vertexai and google-cloud-aiplatform packages.
Set up authentication with a service account key or use gcloud auth application-default login.
Set environment variables for your Google Cloud project and region.

bash

pip install vertexai google-cloud-aiplatform

Step by step

This example shows how to perform online prediction with a deployed Vertex AI model using the vertexai SDK.

python

import os
import vertexai
from vertexai.preview import PredictionClient

# Set your Google Cloud project and location
os.environ["GOOGLE_CLOUD_PROJECT"] = "your-project-id"
location = "us-central1"

# Initialize Vertex AI SDK
vertexai.init(project=os.environ["GOOGLE_CLOUD_PROJECT"], location=location)

# Replace with your deployed model resource name
model_name = "projects/your-project-id/locations/us-central1/models/your-model-id"

# Create a prediction client
client = PredictionClient(location=location)

# Prepare input instance(s) for prediction
instances = [
    {"content": "Example input text for prediction"}
]

# Call the predict method
response = client.predict(model=model_name, instances=instances)

# Print prediction results
print("Prediction response:", response.predictions)

output

Prediction response: [{'output': 'Predicted result text or values'}]

Common variations

Async prediction: Use asyncio with PredictionClient for asynchronous calls.
Batch prediction: Use Vertex AI batch prediction jobs for large datasets instead of online prediction.
Different input types: Adjust instances format based on your model input schema (e.g., images, tabular data).
Using google-cloud-aiplatform client: You can also use aiplatform.Model for prediction with model.predict().

python

import asyncio
import vertexai
from vertexai.preview import PredictionClient

async def async_predict():
    vertexai.init(project=os.environ["GOOGLE_CLOUD_PROJECT"], location="us-central1")
    client = PredictionClient(location="us-central1")
    model_name = "projects/your-project-id/locations/us-central1/models/your-model-id"
    instances = [{"content": "Async input example"}]
    response = await client.predict(model=model_name, instances=instances)
    print("Async prediction response:", response.predictions)

asyncio.run(async_predict())

output

Async prediction response: [{'output': 'Predicted result text or values'}]

Troubleshooting

If you get PermissionDenied, verify your service account has Vertex AI User role.
If model not found, confirm the model resource name is correct and the model is deployed.
For authentication errors, ensure GOOGLE_APPLICATION_CREDENTIALS points to a valid service account JSON key or use gcloud auth application-default login.
Check network connectivity and firewall rules if requests time out.

✅

Key Takeaways

Use the vertexai SDK's PredictionClient for online prediction with deployed models.
Always set your Google Cloud project and location before initializing the SDK.
Input instances must match your model's expected input schema for successful prediction.
Use asynchronous calls for improved performance in concurrent prediction scenarios.
Verify IAM permissions and authentication setup to avoid common errors.

Verified 2026-04 · vertexai.PredictionClient, aiplatform.Model

Verify ↗