How to train LoRA adapter
Quick answer
To train a
LoRA adapter, use the PEFT library with a pretrained base model from transformers. Load the model, configure LoRA parameters via LoraConfig, wrap the model with get_peft_model, then fine-tune using a trainer like transformers.Trainer on your dataset.PREREQUISITES
Python 3.8+pip install transformers peft datasets torchAccess to a pretrained Hugging Face model (e.g., meta-llama/Llama-3.1-8B-Instruct)Basic knowledge of PyTorch and Hugging Face Trainer
Setup
Install required packages and import necessary modules for LoRA training.
pip install transformers peft datasets torch Step by step
This example shows how to train a LoRA adapter on a text dataset using Hugging Face transformers and peft. It loads a pretrained model, applies LoRA, prepares a dataset, and fine-tunes the adapter.
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from datasets import load_dataset
# Load pretrained model and tokenizer
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float16)
# Configure LoRA
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
task_type=TaskType.CAUSAL_LM
)
# Wrap model with LoRA
model = get_peft_model(model, lora_config)
# Load a small dataset for fine-tuning
dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="train[:1%]")
# Tokenize function
def tokenize_function(examples):
return tokenizer(examples["text"], truncation=True, max_length=512)
tokenized_dataset = dataset.map(tokenize_function, batched=True)
# Training arguments
training_args = TrainingArguments(
output_dir="./lora-llama",
per_device_train_batch_size=4,
num_train_epochs=1,
logging_steps=10,
save_steps=10,
save_total_limit=1,
fp16=True,
evaluation_strategy="no"
)
# Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset
)
# Train LoRA adapter
trainer.train()
# Save LoRA adapter
model.save_pretrained("./lora-llama-adapter") output
***** Running training ***** Num examples = 288 Num Epochs = 1 Instantaneous batch size per device = 4 Total train batch size (w. parallel, distributed & accumulation) = 4 Gradient Accumulation steps = 1 Total optimization steps = 72 ... Training completed. Model saved to ./lora-llama-adapter
Common variations
- Use
BitsAndBytesConfigto combine LoRA with 4-bit quantization (QLoRA) for memory efficiency. - Train asynchronously with PyTorch Lightning or accelerate for distributed setups.
- Apply LoRA to different base models by changing
model_nameand adjustingtarget_modules.
Troubleshooting
- If you get CUDA out-of-memory errors, reduce batch size or enable gradient checkpointing.
- Ensure
target_modulesmatch the model architecture; otherwise, LoRA won't apply correctly. - Check that
transformers,peft, anddatasetsversions are compatible.
Key Takeaways
- Use
peftwithLoraConfigandget_peft_modelto efficiently fine-tune large models. - Combine LoRA with 4-bit quantization (QLoRA) for memory-efficient training on limited hardware.
- Always match
target_modulesto your model's architecture to ensure LoRA layers are applied correctly.