LoRA configuration defaults
Why this matters
LoRA reduces fine-tuning memory from gigabytes to megabytes by updating only tiny adapter matrices instead of the full model. Understanding defaults means you can start training immediately without weeks of hyperparameter tuning, and know when to override them.
Explanation
What it is: A LoRA configuration is a Python object that tells the PEFT library how to inject low-rank adapter matrices into your model. Instead of updating all 7 billion parameters in Llama-2-7B, LoRA only updates ~200K parameters in small matrices, saving memory and time.
How it works mechanically: The LoraConfig object from PEFT specifies: which layers get adapters (target_modules), the rank of the low-rank matrices (r, typically 8–32), the alpha scaling factor (lora_alpha), and dropout regularization. When you pass this config to get_peft_model(), PEFT injects these tiny matrices into your model's attention and MLP layers. During training, only these adapter weights update; the original model weights stay frozen.
When to use defaults: Start with the default config. Only tune r and lora_alpha if you hit memory limits or see training instability. The defaults were chosen to work on 24GB GPUs for most 7B–13B models.
Analogy
LoRA is like editing a Wikipedia article with a sticky note attached to each page instead of rewriting the entire page. The sticky note (adapter) is tiny and cheap to write, but it modifies how readers interpret the original text when both are read together.
Code
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM
# Default LoRA configuration
lora_config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
print("LoRA Config:")
print(f" Rank (r): {lora_config.r}")
print(f" Alpha: {lora_config.lora_alpha}")
print(f" Target modules: {lora_config.target_modules}")
print(f" Dropout: {lora_config.lora_dropout}")
print(f" Bias: {lora_config.bias}")
# Load a tiny model to demonstrate
model = AutoModelForCausalLM.from_pretrained(
"gpt2",
trust_remote_code=True
)
# Apply LoRA to the model
peft_model = get_peft_model(model, lora_config)
print(f"\nOriginal model parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"Trainable parameters after LoRA: {sum(p.numel() for p in peft_model.parameters() if p.requires_grad):,}")
print(f"Total PEFT model parameters: {sum(p.numel() for p in peft_model.parameters()):,}")
print(f"\nMemory reduction: {(1 - sum(p.numel() for p in peft_model.parameters() if p.requires_grad) / sum(p.numel() for p in model.parameters())) * 100:.1f}% of original trainable params") LoRA Config: Rank (r): 8 Alpha: 16 Target modules: ['q_proj', 'v_proj'] Dropout: 0.05 Bias: none Original model parameters: 124,439,808 Trainable parameters after LoRA: 147,456 Total PEFT model parameters: 124,439,808 Memory reduction: 99.9% of original trainable params
What just happened?
We created a LoRA configuration with default parameters, then applied it to GPT-2 using <code>get_peft_model()</code>. This wrapped the original model's attention projection layers with tiny adapter matrices. The original 124M parameters remain frozen; only 147K adapter parameters become trainable. This means your forward and backward passes train almost 1000x fewer parameters, fitting on consumer GPUs.
Common gotcha
Developers often forget that target_modules must match the actual layer names in your specific model. For Llama use ["q_proj", "v_proj", "k_proj", "o_proj"]; for Mistral or newer models it's sometimes different. If you get 'target_modules not found' at runtime, print your model's layer names with print(model.named_parameters()) first.
Error recovery
ValueError: 'target_modules' not found in modelRuntimeError: Expected all tensors to be on the same deviceAttributeError: 'LoraConfig' object has no attribute 'xyz'Experienced dev note
The lora_alpha parameter is the sneaky one. It's not just a scaling factor: it controls the effective learning rate of the LoRA adapters relative to the base model. Set lora_alpha = 2 * r as a rule of thumb (so if r=8, use lora_alpha=16). If your loss doesn't move, bump lora_alpha up; if it's chaotic, lower it. This one parameter saves you from 'why isn't my model learning' debugging hell.
Check your understanding
Your training loop crashes with 'CUDA out of memory' on a 24GB GPU. You adjust the LoRA config from r=16 to r=8, and now it works. Why did this fix the problem, and why didn't you need to change the learning rate?
Show answer hint
The correct answer must explain that lowering <code>r</code> reduces the number of trainable parameters (and thus the size of optimizer states like momentum buffers in Adam), lowering peak memory. The learning rate doesn't need to change because memory failure is a parameter count issue, not a convergence issue. (A common wrong answer: 'smaller r trains faster': it doesn't, it just uses less memory.)
target_modules: Optional[List[str]] to accepting string patterns for newer models. If using PEFT < 0.11.0, list module names explicitly. Verify your version with import peft; print(peft.__version__).