How to beginner · 3 min read

How to track fine-tuning with wandb

Quick answer
Use the wandb Python SDK to initialize a run before fine-tuning your model, log hyperparameters and training metrics during the process, and save model checkpoints as artifacts. This enables real-time tracking and visualization of your fine-tuning experiments.

PREREQUISITES

  • Python 3.8+
  • wandb account (free tier available)
  • pip install wandb
  • Access to your fine-tuning script or training loop

Setup

Install the wandb Python package and log in to your Wandb account to enable experiment tracking.

bash
pip install wandb
wandb login

Step by step

Initialize a wandb run in your fine-tuning script, log hyperparameters, training metrics, and save model checkpoints as artifacts for comprehensive tracking.

python
import wandb

# Initialize a new wandb run
wandb.init(project="fine-tuning-project", entity="your-username", config={
    "learning_rate": 5e-5,
    "epochs": 3,
    "batch_size": 16
})

config = wandb.config

# Example training loop
for epoch in range(config.epochs):
    # Simulate training metric
    train_loss = 0.05 * (config.epochs - epoch)  # dummy decreasing loss
    val_accuracy = 0.8 + 0.05 * epoch  # dummy increasing accuracy

    # Log metrics to wandb
    wandb.log({"train_loss": train_loss, "val_accuracy": val_accuracy, "epoch": epoch})

# Save model checkpoint as artifact
artifact = wandb.Artifact("fine-tuned-model", type="model")
artifact.add_file("./model_checkpoint.pth")  # path to your checkpoint file
wandb.log_artifact(artifact)

# Finish the run
wandb.finish()
output
wandb: Currently logged in as: your-username
wandb: Tracking run with ID: <run_id>
wandb: Run finished. View run at: https://wandb.ai/your-username/fine-tuning-project/runs/<run_id>

Common variations

  • Use wandb.watch(model) to automatically log gradients and model topology during training.
  • Integrate wandb with popular frameworks like PyTorch Lightning or Hugging Face Trainer for seamless tracking.
  • Log additional artifacts such as tokenizers or training datasets for reproducibility.
python
import wandb
import torch

wandb.init(project="fine-tuning-project")

# Assuming `model` is your PyTorch model
wandb.watch(model, log="all")

# Training loop here
# wandb.log(...) as usual

wandb.finish()

Troubleshooting

  • If you see wandb: ERROR API key not found, ensure you have run wandb login or set WANDB_API_KEY environment variable.
  • If metrics do not appear in your dashboard, verify your internet connection and that wandb.finish() is called at the end of your script.
  • For large model files, use artifact.add_dir() to log directories instead of single files.

Key Takeaways

  • Initialize wandb runs with hyperparameters to track fine-tuning configurations.
  • Log training and validation metrics each epoch for real-time monitoring.
  • Save model checkpoints as wandb.Artifact for versioned reproducibility.
  • Use wandb.watch() to automatically log model gradients and topology.
  • Always call wandb.finish() to ensure logs are uploaded.
Verified 2026-04
Verify ↗