Concept Intermediate · 3 min read

What is LoRA fine-tuning

Quick answer
LoRA fine-tuning is a technique that adapts large language models by injecting trainable low-rank matrices into existing weights, allowing efficient fine-tuning with fewer parameters. It reduces compute and storage costs compared to full model fine-tuning while maintaining performance.
Low-Rank Adaptation (LoRA) is a fine-tuning method that efficiently adapts large language models by training low-rank update matrices instead of full model weights.

How it works

LoRA works by freezing the original model weights and learning small, low-rank matrices that approximate the necessary changes for a new task. Imagine the large model weights as a massive, detailed painting. Instead of repainting the entire canvas, LoRA adds a few transparent overlays that adjust the colors and shapes just enough to create a new image. This approach drastically reduces the number of parameters that need updating, making fine-tuning faster and more resource-efficient.

Concrete example

Suppose you have a pretrained transformer layer with a weight matrix W of size 1000×1000. Instead of updating W directly, LoRA learns two smaller matrices A and B with shapes 1000×r and r×1000 respectively, where r is a small rank (e.g., 4 or 8). The adapted weight becomes W + BA. Only A and B are trained, drastically reducing trainable parameters.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Pseudocode for LoRA parameter update
# Assume base model weights W are frozen
# Initialize low-rank matrices A and B
r = 4  # low rank

# Shapes
# W: (1000, 1000)
# A: (1000, r)
# B: (r, 1000)

# Forward pass with LoRA adaptation
# output = (W + B @ A) @ input_vector

print(f"LoRA trains {2 * 1000 * r} parameters instead of 1,000,000.")
output
LoRA trains 8000 parameters instead of 1000000.

When to use it

Use LoRA fine-tuning when you want to adapt large pretrained models efficiently with limited compute or storage, such as customizing models for domain-specific tasks or languages. It is ideal when full fine-tuning is too costly or impractical. Avoid LoRA if you require modifying the entire model weights deeply or if the task demands extensive model architecture changes.

Key terms

TermDefinition
Low-Rank Adaptation (LoRA)A fine-tuning method that trains low-rank matrices added to frozen model weights.
Rank (r)The small dimension controlling the size of LoRA update matrices.
Frozen weightsModel parameters kept fixed during fine-tuning.
Parameter-efficient fine-tuningTechniques that update fewer parameters to reduce cost and storage.

Key Takeaways

  • LoRA fine-tuning trains only small low-rank matrices, not full model weights, saving compute and storage.
  • It enables fast adaptation of large models for new tasks without full retraining.
  • Use LoRA when you need efficient customization but not when deep model changes are required.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022
Verify ↗