Concept Intermediate · 3 min read

What is LoRA fine-tuning

Quick answer

LoRA fine-tuning is a technique that adapts large language models by injecting trainable low-rank matrices into existing weights, allowing efficient fine-tuning with fewer parameters. It reduces compute and storage costs compared to full model fine-tuning while maintaining performance.

Low-Rank Adaptation (LoRA) is a fine-tuning method that efficiently adapts large language models by training low-rank update matrices instead of full model weights.

How it works

LoRA works by freezing the original model weights and learning small, low-rank matrices that approximate the necessary changes for a new task. Imagine the large model weights as a massive, detailed painting. Instead of repainting the entire canvas, LoRA adds a few transparent overlays that adjust the colors and shapes just enough to create a new image. This approach drastically reduces the number of parameters that need updating, making fine-tuning faster and more resource-efficient.

Concrete example

Suppose you have a pretrained transformer layer with a weight matrix W of size 1000×1000. Instead of updating W directly, LoRA learns two smaller matrices A and B with shapes 1000×r and r×1000 respectively, where r is a small rank (e.g., 4 or 8). The adapted weight becomes W + BA. Only A and B are trained, drastically reducing trainable parameters.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Pseudocode for LoRA parameter update
# Assume base model weights W are frozen
# Initialize low-rank matrices A and B
r = 4  # low rank

# Shapes
# W: (1000, 1000)
# A: (1000, r)
# B: (r, 1000)

# Forward pass with LoRA adaptation
# output = (W + B @ A) @ input_vector

print(f"LoRA trains {2 * 1000 * r} parameters instead of 1,000,000.")

output

LoRA trains 8000 parameters instead of 1000000.

When to use it

Use LoRA fine-tuning when you want to adapt large pretrained models efficiently with limited compute or storage, such as customizing models for domain-specific tasks or languages. It is ideal when full fine-tuning is too costly or impractical. Avoid LoRA if you require modifying the entire model weights deeply or if the task demands extensive model architecture changes.

Key terms

Term	Definition
Low-Rank Adaptation (LoRA)	A fine-tuning method that trains low-rank matrices added to frozen model weights.
Rank (r)	The small dimension controlling the size of LoRA update matrices.
Frozen weights	Model parameters kept fixed during fine-tuning.
Parameter-efficient fine-tuning	Techniques that update fewer parameters to reduce cost and storage.

✅

Key Takeaways

LoRA fine-tuning trains only small low-rank matrices, not full model weights, saving compute and storage.
It enables fast adaptation of large models for new tasks without full retraining.
Use LoRA when you need efficient customization but not when deep model changes are required.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗