How to beginner · 3 min read

How to use Gemini on Vertex AI

Quick answer
Use the vertexai Python SDK to access Gemini models on Vertex AI by initializing with your Google Cloud project and location, then create a GenerativeModel instance with the Gemini model name like gemini-2.5-pro. Call generate_content() with your prompt to get completions. Authentication uses Application Default Credentials or service account keys.

PREREQUISITES

  • Python 3.8+
  • Google Cloud project with Vertex AI enabled
  • Google Cloud SDK installed and authenticated (gcloud auth application-default login)
  • pip install vertexai
  • Set environment variable GOOGLE_CLOUD_PROJECT
  • Set environment variable GOOGLE_CLOUD_LOCATION (e.g., us-central1)

Setup

Install the vertexai Python SDK and authenticate your Google Cloud environment. Set your project ID and location as environment variables for easy configuration.

bash
pip install vertexai

Step by step

Initialize the Vertex AI SDK, load the Gemini model, and generate text completions with a prompt. This example uses synchronous code and the gemini-2.5-pro model.

python
import os
import vertexai
from vertexai.generative_models import GenerativeModel

# Set your Google Cloud project and location
os.environ["GOOGLE_CLOUD_PROJECT"] = "your-project-id"
os.environ["GOOGLE_CLOUD_LOCATION"] = "us-central1"

# Initialize Vertex AI SDK
vertexai.init(project=os.environ["GOOGLE_CLOUD_PROJECT"], location=os.environ["GOOGLE_CLOUD_LOCATION"])

# Load Gemini model
model = GenerativeModel("gemini-2.5-pro")

# Generate content
response = model.generate_content("Explain quantum computing in simple terms.")

print(response.text)
output
Quantum computing is a type of computing that uses quantum bits, or qubits, which can represent both 0 and 1 at the same time. This allows quantum computers to solve certain problems much faster than traditional computers.

Common variations

You can use streaming to receive partial outputs as they are generated, or use different Gemini models like gemini-2.0-flash for faster responses. The SDK also supports asynchronous calls.

python
import asyncio
import os
import vertexai
from vertexai.generative_models import GenerativeModel

os.environ["GOOGLE_CLOUD_PROJECT"] = "your-project-id"
os.environ["GOOGLE_CLOUD_LOCATION"] = "us-central1"

vertexai.init(project=os.environ["GOOGLE_CLOUD_PROJECT"], location=os.environ["GOOGLE_CLOUD_LOCATION"])

model = GenerativeModel("gemini-2.0-flash")

async def async_generate():
    response = await model.generate_content("What is AI?", stream=True)
    async for chunk in response:
        print(chunk.text, end="", flush=True)

asyncio.run(async_generate())
output
Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems...

Troubleshooting

  • If you get authentication errors, ensure you have run gcloud auth application-default login and set the correct GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION environment variables.
  • If the model name is invalid, verify the current Gemini model names on the official Google Vertex AI documentation.
  • For quota or permission issues, check your Google Cloud IAM roles and Vertex AI quotas.

Key Takeaways

  • Use the official vertexai Python SDK to access Gemini models on Vertex AI.
  • Authenticate with Application Default Credentials and set project/location environment variables.
  • Call generate_content() on GenerativeModel instances for text generation.
  • Streaming and async calls are supported for responsive applications.
  • Verify model names and permissions if you encounter errors.
Verified 2026-04 · gemini-2.5-pro, gemini-2.0-flash
Verify ↗