How to fine-tune on free GPU
Quick answer
You can fine-tune models on free GPUs by using cloud platforms like
Google Colab or Kaggle Kernels that provide limited GPU access. Use frameworks like Hugging Face Transformers with PyTorch or TensorFlow to run fine-tuning scripts within these environments.PREREQUISITES
Python 3.8+Google account for Colab or Kagglepip install transformers datasets accelerateBasic knowledge of PyTorch or TensorFlow
Setup free GPU environment
Use Google Colab or Kaggle Kernels to access free GPUs. Both platforms provide limited GPU time (usually Tesla T4 or P100) and RAM. Start by creating a new notebook and enabling GPU acceleration in the runtime settings.
Install necessary libraries with pip:
%%capture
!pip install transformers datasets accelerate Step by step fine-tuning code
This example fine-tunes a Hugging Face distilbert-base-uncased model on a text classification task using the datasets library and Trainer API. It runs efficiently on free GPUs.
from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments
from datasets import load_dataset
import os
# Load dataset
raw_datasets = load_dataset('glue', 'mrpc')
# Load tokenizer and model
model_name = 'distilbert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
# Tokenize function
def tokenize_function(examples):
return tokenizer(examples['sentence1'], examples['sentence2'], truncation=True)
# Tokenize datasets
tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
# Training arguments
training_args = TrainingArguments(
output_dir='./results',
evaluation_strategy='epoch',
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=3,
weight_decay=0.01,
push_to_hub=False,
logging_dir='./logs'
)
# Initialize Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['validation']
)
# Train model
trainer.train() output
***** Running training ***** Num examples = 3668 Num Epochs = 3 Instantaneous batch size per device = 16 Total optimization steps = 687 ... Training completed. Model saved to ./results
Common variations
- Use
accelerateto optimize multi-GPU or mixed precision training. - Switch model to larger ones like
bert-base-uncasedorroberta-baseif GPU memory allows. - Use
Trainercallbacks for early stopping or custom evaluation. - Run asynchronously or with streaming logs in Colab for better monitoring.
Troubleshooting tips
- If you get
CUDA out of memory, reduce batch size or switch to gradient accumulation. - If GPU is not detected, ensure runtime type is set to GPU in Colab or Kaggle settings.
- Free GPUs have usage limits; if disconnected, wait or upgrade to paid tiers.
- Use
!nvidia-smiin notebook to verify GPU availability.
!nvidia-smi output
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.73.08 Driver Version: 510.73.08 CUDA Version: 11.6 | | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | | +-------------------------------+--------------+---------------------+ | 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 | | N/A 55C P0 30W / 70W | 500MiB / 15109MiB | 10% Default | +-------------------------------+--------------+---------------------+
Key Takeaways
- Use Google Colab or Kaggle for free GPU access to fine-tune models without local hardware.
- Leverage Hugging Face Transformers and Trainer API for simple, efficient fine-tuning workflows.
- Adjust batch size and epochs to fit within free GPU memory and time limits.
- Verify GPU availability with !nvidia-smi and set runtime to GPU before training.
- Free GPUs have usage limits; plan training sessions accordingly or consider paid options for heavy workloads.