How to intermediate · 3 min read

How to deploy Gemini model on Vertex AI

Quick answer
To deploy the gemini-1.5-pro model on Vertex AI, use the Google Cloud Vertex AI Python SDK to create an endpoint and deploy the model by specifying the prebuilt Gemini container image. Authenticate with Google Cloud, configure the deployment parameters, and then send prediction requests to the endpoint. This enables scalable, managed hosting of Gemini models on Vertex AI.

PREREQUISITES

  • Python 3.8+
  • Google Cloud account with Vertex AI enabled
  • Google Cloud SDK installed and authenticated
  • pip install google-cloud-aiplatform
  • Google Cloud project with billing enabled

Setup

Install the google-cloud-aiplatform Python package and authenticate your Google Cloud SDK. Set environment variables for your project ID, region, and model details.

bash
pip install google-cloud-aiplatform

Step by step

This example shows how to deploy the gemini-1.5-pro model on Vertex AI by creating an endpoint and deploying the model to it. Replace PROJECT_ID and REGION with your Google Cloud project and region.

python
from google.cloud import aiplatform
import os

# Set environment variables
PROJECT_ID = os.environ.get('GOOGLE_CLOUD_PROJECT')
REGION = 'us-central1'
MODEL_DISPLAY_NAME = 'gemini-1.5-pro'
ENDPOINT_DISPLAY_NAME = 'gemini-endpoint'

# Initialize the AI Platform client
client_options = {'api_endpoint': f'{REGION}-aiplatform.googleapis.com'}
aiplatform.init(project=PROJECT_ID, location=REGION)

# Create an endpoint
endpoint = aiplatform.Endpoint.create(display_name=ENDPOINT_DISPLAY_NAME)
print(f'Created endpoint: {endpoint.resource_name}')

# Deploy the Gemini model to the endpoint
# Use the prebuilt Gemini container image URI provided by Google
GEMINI_CONTAINER_IMAGE_URI = 'gcr.io/google-containers/gemini-1.5-pro:latest'

model = aiplatform.Model.upload(
    display_name=MODEL_DISPLAY_NAME,
    container_image_uri=GEMINI_CONTAINER_IMAGE_URI,
    serving_container_predict_route='/v1/models/gemini:predict',
    serving_container_health_route='/v1/models/gemini'
)

model.deploy(
    endpoint=endpoint,
    deployed_model_display_name=MODEL_DISPLAY_NAME,
    machine_type='n1-standard-4',
    min_replica_count=1,
    max_replica_count=1
)

print(f'Deployed model {MODEL_DISPLAY_NAME} to endpoint {endpoint.resource_name}')
output
Created endpoint: projects/PROJECT_ID/locations/us-central1/endpoints/1234567890123456789
Deployed model gemini-1.5-pro to endpoint projects/PROJECT_ID/locations/us-central1/endpoints/1234567890123456789

Common variations

  • Use different machine types like n1-highmem-8 for larger workloads.
  • Deploy multiple models to the same endpoint for A/B testing.
  • Use asynchronous prediction requests for batch processing.
  • Deploy other Gemini versions by changing the container image URI.

Troubleshooting

  • If you get authentication errors, ensure your Google Cloud SDK is authenticated with gcloud auth login and your environment variables are set.
  • If deployment fails, check IAM permissions for Vertex AI Model Admin and Endpoint Admin roles.
  • Verify the container image URI is correct and accessible.
  • Check quota limits in your Google Cloud project for Vertex AI resources.

Key Takeaways

  • Use the Google Cloud Vertex AI Python SDK to deploy Gemini models with managed endpoints.
  • Set correct project, region, and container image URI for Gemini model deployment.
  • Ensure proper authentication and IAM permissions to avoid deployment errors.
Verified 2026-04 · gemini-1.5-pro
Verify ↗