What is PEFT in fine-tuning
PEFT (Parameter-Efficient Fine-Tuning) is a method that fine-tunes large language models by updating only a small subset of parameters instead of the entire model. This approach reduces computational cost and memory usage while preserving model performance.Parameter-Efficient Fine-Tuning (PEFT) is a fine-tuning technique that updates only a small fraction of a large model's parameters to efficiently adapt it to new tasks.How it works
PEFT works by freezing most of the pre-trained model's parameters and training only a small set of additional parameters or adapters. Imagine a huge library where instead of rewriting every book, you add sticky notes with updates only where needed. This drastically reduces the resources required for fine-tuning while still allowing the model to learn task-specific nuances.
Common PEFT methods include adding small adapter layers, low-rank updates (LoRA), or prefix tuning, which inject trainable parameters into the model without modifying the original weights.
Concrete example
Here is a simple example using the peft library with Hugging Face Transformers to apply LoRA (a popular PEFT method) on a gpt-2 model:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import get_peft_model, LoraConfig
import os
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Configure LoRA parameters
lora_config = LoraConfig(
r=8, # rank of LoRA matrices
lora_alpha=32, # scaling factor
target_modules=["c_attn"], # modules to apply LoRA
lora_dropout=0.1,
bias="none"
)
# Wrap model with PEFT LoRA
peft_model = get_peft_model(model, lora_config)
# Now only LoRA parameters are trainable
for name, param in peft_model.named_parameters():
print(f"{name}: {param.requires_grad}")
# Example input
inputs = tokenizer("Hello, PEFT!", return_tensors="pt")
outputs = peft_model(**inputs)
print(outputs.logits.shape) c_attn.lora_A.weight: True c_attn.lora_B.weight: True ... (other LoRA params True) transformer.wte.weight: False transformer.h.0.attn.c_attn.weight: False ... (other base model params False) torch.Size([1, 4, 50257])
When to use it
Use PEFT when you want to fine-tune large models efficiently on limited hardware or datasets. It is ideal for adapting foundation models to specific tasks without the cost of full fine-tuning.
Do not use PEFT if you require full model retraining for drastically different tasks or if you have abundant compute and want maximum flexibility.
Key terms
| Term | Definition |
|---|---|
| PEFT | Parameter-Efficient Fine-Tuning, updating only a small subset of model parameters. |
| LoRA | Low-Rank Adaptation, a PEFT method injecting low-rank matrices into model layers. |
| Adapter | Small trainable layers added to a frozen model for task adaptation. |
| Prefix tuning | PEFT method that prepends trainable vectors to model inputs. |
| Fine-tuning | Adjusting a pre-trained model's parameters to a new task. |
Key Takeaways
- PEFT fine-tunes large models by training only a small fraction of parameters, saving compute and memory.
- Popular PEFT methods include LoRA, adapters, and prefix tuning, each adding lightweight trainable components.
- Use PEFT to efficiently adapt foundation models on limited hardware or data without full retraining.