How to intermediate · 4 min read

How to fine-tune LLM with LoRA using Hugging Face

Quick answer
Use Hugging Face's transformers and peft libraries to fine-tune large language models with LoRA. Load a pretrained model, apply LoRA adapters via peft, prepare your dataset, and train with Trainer for efficient parameter tuning.

PREREQUISITES

  • Python 3.8+
  • pip install transformers datasets peft accelerate
  • Hugging Face API token (optional for private models)

Setup

Install the required libraries: transformers for model handling, datasets for data loading, peft for LoRA fine-tuning, and accelerate for efficient training.

bash
pip install transformers datasets peft accelerate

Step by step

This example fine-tunes a GPT-2 model with LoRA on a text dataset using Hugging Face's Trainer. It shows loading the model, applying LoRA, preparing data, and running training.

python
import os
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments
from peft import get_peft_model, LoraConfig, TaskType

# Load tokenizer and model
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Configure LoRA
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1
)

# Apply LoRA to the model
model = get_peft_model(model, lora_config)

# Load dataset
dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="train[:1%]")

# Tokenize dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, max_length=128)

tokenized_dataset = dataset.map(tokenize_function, batched=True, remove_columns=["text"])

# Training arguments
training_args = TrainingArguments(
    output_dir="./lora-finetuned-gpt2",
    per_device_train_batch_size=8,
    num_train_epochs=1,
    logging_steps=10,
    save_steps=50,
    save_total_limit=2,
    evaluation_strategy="no",
    learning_rate=3e-4,
    fp16=True
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset
)

# Train
trainer.train()

# Save LoRA adapters
model.save_pretrained("./lora-finetuned-gpt2")
output
***** Running training *****
  Num examples = 287
  Num Epochs = 1
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 36

[...training logs...]
Training completed. Model saved to ./lora-finetuned-gpt2

Common variations

  • Use Trainer with evaluation and validation splits for monitoring.
  • Apply LoRA to other model architectures like bert-base-uncased by changing task_type accordingly.
  • Use accelerate for distributed or mixed precision training.
  • Fine-tune asynchronously or with streaming datasets for large corpora.

Troubleshooting

  • If you get CUDA out-of-memory errors, reduce per_device_train_batch_size or enable gradient checkpointing.
  • Ensure tokenizer padding and truncation settings match your model input requirements.
  • If LoRA adapters are not applied, verify peft version compatibility with your transformers version.
  • For slow training, enable mixed precision with fp16=True in TrainingArguments.

Key Takeaways

  • Use Hugging Face's peft library to apply LoRA adapters for efficient fine-tuning.
  • Prepare and tokenize your dataset with datasets and transformers tokenizer before training.
  • Configure LoraConfig with appropriate parameters like r, lora_alpha, and lora_dropout.
  • Train with Hugging Face Trainer and save the LoRA adapters separately for inference.
  • Adjust batch size and enable mixed precision to avoid memory issues during fine-tuning.
Verified 2026-04 · gpt2, bert-base-uncased
Verify ↗