How to Intermediate · 4 min read

How to use SFTTrainer for fine-tuning

Quick answer
Use the SFTTrainer class from libraries like Hugging Face or OpenAI fine-tuning toolkits to fine-tune large language models on your custom dataset. Prepare your dataset in the required format, instantiate SFTTrainer with model and training parameters, then call train() to start fine-tuning.

PREREQUISITES

  • Python 3.8+
  • pip install transformers datasets accelerate
  • Access to a pretrained model checkpoint (e.g., from Hugging Face Hub)
  • Basic knowledge of PyTorch or TensorFlow

Setup

Install necessary packages and set environment variables. You need transformers, datasets, and accelerate for distributed training support.

bash
pip install transformers datasets accelerate

Step by step

This example shows how to fine-tune a GPT-2 model using SFTTrainer on a custom text dataset.

python
from transformers import GPT2Tokenizer, GPT2LMHeadModel
from datasets import load_dataset
from transformers import Trainer, TrainingArguments
import os

# Load tokenizer and model
model_name = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# Load dataset (example: wikitext-2)
dataset = load_dataset("wikitext", "wikitext-2-raw-v1")

# Tokenize function
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=128)

# Tokenize dataset
tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    save_total_limit=2,
    save_steps=500,
    logging_dir='./logs',
    logging_steps=100,
)

# Initialize Trainer (SFTTrainer is a specialized Trainer for supervised fine-tuning)
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    tokenizer=tokenizer,
)

# Start fine-tuning
trainer.train()
output
***** Running training *****
  Num examples = 36718
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 13767

[...training logs...]
Training completed. Model saved to ./results

Common variations

  • Use SFTTrainer with different base models like gpt-neo or llama.
  • Enable mixed precision training with fp16=True in TrainingArguments for faster training.
  • Use custom datasets by formatting your data as JSON or CSV and loading with datasets.load_dataset.
  • Run training asynchronously or distributed using accelerate launch CLI.

Troubleshooting

  • If you get CUDA out of memory errors, reduce batch size or sequence length.
  • Ensure your dataset is properly tokenized and padded to avoid shape mismatches.
  • Check that your environment has compatible versions of transformers and datasets.
  • Use logging_dir to monitor training progress and debug issues.

Key Takeaways

  • Prepare and tokenize your dataset to match the model input requirements before fine-tuning.
  • Use SFTTrainer or Hugging Face Trainer with proper training arguments for supervised fine-tuning.
  • Adjust batch size, learning rate, and epochs based on your compute resources and dataset size.
  • Enable mixed precision and distributed training for efficiency on modern GPUs.
  • Monitor logs and save checkpoints regularly to avoid losing progress.
Verified 2026-04 · gpt2, gpt-neo, llama
Verify ↗