How to beginner · 3 min read

How to use Vertex AI online prediction

Quick answer
Use the vertexai Python SDK to perform online prediction by initializing the client with your Google Cloud project and location, then loading your deployed model and calling predict() with input data. This enables real-time inference on your Vertex AI deployed models.

PREREQUISITES

  • Python 3.8+
  • Google Cloud project with Vertex AI enabled
  • Service account with Vertex AI permissions
  • Google Cloud SDK installed and authenticated
  • pip install vertexai google-cloud-aiplatform

Setup

Install the required Python packages and authenticate your Google Cloud environment.

  • Install the vertexai and google-cloud-aiplatform packages.
  • Set up authentication with a service account key or use gcloud auth application-default login.
  • Set environment variables for your Google Cloud project and region.
bash
pip install vertexai google-cloud-aiplatform

Step by step

This example shows how to perform online prediction with a deployed Vertex AI model using the vertexai SDK.

python
import os
import vertexai
from vertexai.preview import PredictionClient

# Set your Google Cloud project and location
os.environ["GOOGLE_CLOUD_PROJECT"] = "your-project-id"
location = "us-central1"

# Initialize Vertex AI SDK
vertexai.init(project=os.environ["GOOGLE_CLOUD_PROJECT"], location=location)

# Replace with your deployed model resource name
model_name = "projects/your-project-id/locations/us-central1/models/your-model-id"

# Create a prediction client
client = PredictionClient(location=location)

# Prepare input instance(s) for prediction
instances = [
    {"content": "Example input text for prediction"}
]

# Call the predict method
response = client.predict(model=model_name, instances=instances)

# Print prediction results
print("Prediction response:", response.predictions)
output
Prediction response: [{'output': 'Predicted result text or values'}]

Common variations

  • Async prediction: Use asyncio with PredictionClient for asynchronous calls.
  • Batch prediction: Use Vertex AI batch prediction jobs for large datasets instead of online prediction.
  • Different input types: Adjust instances format based on your model input schema (e.g., images, tabular data).
  • Using google-cloud-aiplatform client: You can also use aiplatform.Model for prediction with model.predict().
python
import asyncio
import vertexai
from vertexai.preview import PredictionClient

async def async_predict():
    vertexai.init(project=os.environ["GOOGLE_CLOUD_PROJECT"], location="us-central1")
    client = PredictionClient(location="us-central1")
    model_name = "projects/your-project-id/locations/us-central1/models/your-model-id"
    instances = [{"content": "Async input example"}]
    response = await client.predict(model=model_name, instances=instances)
    print("Async prediction response:", response.predictions)

asyncio.run(async_predict())
output
Async prediction response: [{'output': 'Predicted result text or values'}]

Troubleshooting

  • If you get PermissionDenied, verify your service account has Vertex AI User role.
  • If model not found, confirm the model resource name is correct and the model is deployed.
  • For authentication errors, ensure GOOGLE_APPLICATION_CREDENTIALS points to a valid service account JSON key or use gcloud auth application-default login.
  • Check network connectivity and firewall rules if requests time out.

Key Takeaways

  • Use the vertexai SDK's PredictionClient for online prediction with deployed models.
  • Always set your Google Cloud project and location before initializing the SDK.
  • Input instances must match your model's expected input schema for successful prediction.
  • Use asynchronous calls for improved performance in concurrent prediction scenarios.
  • Verify IAM permissions and authentication setup to avoid common errors.
Verified 2026-04 · vertexai.PredictionClient, aiplatform.Model
Verify ↗