What is LoRA fine-tuning
How it works
LoRA works by freezing the original model weights and learning small, low-rank matrices that approximate the necessary changes for a new task. Imagine the large model weights as a massive, detailed painting. Instead of repainting the entire canvas, LoRA adds a few transparent overlays that adjust the colors and shapes just enough to create a new image. This approach drastically reduces the number of parameters that need updating, making fine-tuning faster and more resource-efficient.
Concrete example
Suppose you have a pretrained transformer layer with a weight matrix W of size 1000×1000. Instead of updating W directly, LoRA learns two smaller matrices A and B with shapes 1000×r and r×1000 respectively, where r is a small rank (e.g., 4 or 8). The adapted weight becomes W + BA. Only A and B are trained, drastically reducing trainable parameters.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Pseudocode for LoRA parameter update
# Assume base model weights W are frozen
# Initialize low-rank matrices A and B
r = 4 # low rank
# Shapes
# W: (1000, 1000)
# A: (1000, r)
# B: (r, 1000)
# Forward pass with LoRA adaptation
# output = (W + B @ A) @ input_vector
print(f"LoRA trains {2 * 1000 * r} parameters instead of 1,000,000.") LoRA trains 8000 parameters instead of 1000000.
When to use it
Use LoRA fine-tuning when you want to adapt large pretrained models efficiently with limited compute or storage, such as customizing models for domain-specific tasks or languages. It is ideal when full fine-tuning is too costly or impractical. Avoid LoRA if you require modifying the entire model weights deeply or if the task demands extensive model architecture changes.
Key terms
| Term | Definition |
|---|---|
| Low-Rank Adaptation (LoRA) | A fine-tuning method that trains low-rank matrices added to frozen model weights. |
| Rank (r) | The small dimension controlling the size of LoRA update matrices. |
| Frozen weights | Model parameters kept fixed during fine-tuning. |
| Parameter-efficient fine-tuning | Techniques that update fewer parameters to reduce cost and storage. |
Key Takeaways
- LoRA fine-tuning trains only small low-rank matrices, not full model weights, saving compute and storage.
- It enables fast adaptation of large models for new tasks without full retraining.
- Use LoRA when you need efficient customization but not when deep model changes are required.