How to Intermediate · 4 min read

Vertex AI supervised fine-tuning guide

Q: Vertex AI supervised fine-tuning guide

Use the google-cloud-aiplatform Python SDK to create a fine-tuning job on Vertex AI by preparing a labeled dataset, configuring a CustomJob or FineTuningJob, and deploying the fine-tuned model. The process involves uploading training data to Google Cloud Storage, defining training parameters, and monitoring the job via the SDK or Google Cloud Console.

Quick answer

Use the google-cloud-aiplatform Python SDK to create a fine-tuning job on Vertex AI by preparing a labeled dataset, configuring a CustomJob or FineTuningJob, and deploying the fine-tuned model. The process involves uploading training data to Google Cloud Storage, defining training parameters, and monitoring the job via the SDK or Google Cloud Console.

PREREQUISITES

Python 3.8+
Google Cloud project with Vertex AI enabled
Google Cloud SDK installed and configured
Service account with Vertex AI permissions
pip install google-cloud-aiplatform

Setup

Install the google-cloud-aiplatform SDK and set environment variables for authentication and project configuration.

Enable Vertex AI API in your Google Cloud project.
Set GOOGLE_APPLICATION_CREDENTIALS to your service account JSON key.
Install the SDK with pip install google-cloud-aiplatform.

bash

pip install google-cloud-aiplatform

Step by step

This example demonstrates supervised fine-tuning on Vertex AI using a prepared dataset in Google Cloud Storage. It creates a CustomJob to train a model and deploys the fine-tuned model.

python

from google.cloud import aiplatform
import os

# Set your Google Cloud project and region
PROJECT_ID = os.environ.get('GOOGLE_CLOUD_PROJECT')
REGION = 'us-central1'
BUCKET_NAME = 'your-gcs-bucket'
TRAINING_DATA_URI = f'gs://{BUCKET_NAME}/training_data.csv'

# Initialize Vertex AI SDK

aiplatform.init(project=PROJECT_ID, location=REGION)

# Define training job parameters
job_display_name = 'vertex-ai-supervised-finetune'

# Define training container image (example: custom training container or prebuilt)
training_container_image = 'gcr.io/cloud-aiplatform/training/tf-cpu.2-11:latest'

# Define worker pool spec
worker_pool_specs = [
    {
        'machine_spec': {'machine_type': 'n1-standard-4'},
        'replica_count': 1,
        'container_spec': {
            'image_uri': training_container_image,
            'command': [
                'python3', 'trainer/task.py',
                '--data-path', TRAINING_DATA_URI
            ]
        }
    }
]

# Create CustomJob
custom_job = aiplatform.CustomJob(
    display_name=job_display_name,
    worker_pool_specs=worker_pool_specs
)

# Run training job
custom_job.run(sync=True)

# After training, deploy the model (example assumes model artifact is saved to GCS)
model_display_name = 'vertex-ai-finetuned-model'
model_artifact_uri = f'gs://{BUCKET_NAME}/model/'

model = aiplatform.Model.upload(
    display_name=model_display_name,
    artifact_uri=model_artifact_uri,
    serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-11:latest'
)

endpoint = model.deploy(machine_type='n1-standard-4')

print(f'Model deployed to endpoint: {endpoint.resource_name}')

output

Model deployed to endpoint: projects/PROJECT_ID/locations/us-central1/endpoints/1234567890

Common variations

You can fine-tune using different training frameworks by specifying custom containers or prebuilt containers for PyTorch, TensorFlow, or scikit-learn.

Use asynchronous job execution by setting sync=False in custom_job.run() to monitor progress separately.

For large datasets, use Vertex AI Dataset resources and AutoML training jobs for easier management.

python

from google.cloud import aiplatform

aiplatform.init(project=os.environ['GOOGLE_CLOUD_PROJECT'], location='us-central1')

# Async training example
custom_job = aiplatform.CustomJob(
    display_name='async-finetune-job',
    worker_pool_specs=worker_pool_specs
)

custom_job.run(sync=False)
print(f'Training job started: {custom_job.resource_name}')

output

Training job started: projects/PROJECT_ID/locations/us-central1/customJobs/1234567890

Troubleshooting

Authentication errors: Ensure GOOGLE_APPLICATION_CREDENTIALS points to a valid service account JSON with Vertex AI permissions.
Permission denied: Verify your service account has roles like Vertex AI Admin and Storage Object Viewer.
Training job fails: Check logs in Google Cloud Console under Vertex AI > Training jobs for detailed error messages.
Model deployment issues: Confirm the model artifact path is correct and the serving container image matches your model framework.

✅

Key Takeaways

Use the official google-cloud-aiplatform SDK to manage supervised fine-tuning jobs on Vertex AI.
Prepare and upload your labeled training data to Google Cloud Storage before starting a fine-tuning job.
Monitor training jobs asynchronously and deploy fine-tuned models with appropriate serving containers.
Ensure proper IAM permissions and authentication setup to avoid common errors.
Customize training with different containers or frameworks by adjusting the worker pool specs.

Verified 2026-04 · CustomJob, FineTuningJob, aiplatform.Model

Verify ↗