Code Beginner easy · 4 min

LoRA configuration defaults

What you will learn

LoRA configuration objects define how parameter-efficient fine-tuning modifies your model, with sensible defaults that work for most use cases.

Why this matters

LoRA reduces fine-tuning memory from gigabytes to megabytes by updating only tiny adapter matrices instead of the full model. Understanding defaults means you can start training immediately without weeks of hyperparameter tuning, and know when to override them.

Skip if: You should NOT use LoRA if you need to modify the full model's behavior for domain-specific tasks (e.g., adding new token types), or if you have unlimited GPU memory and want maximum performance: full fine-tuning may be better. Also skip LoRA if your model doesn't support it (some custom architectures don't work with PEFT).

Explanation

What it is: A LoRA configuration is a Python object that tells the PEFT library how to inject low-rank adapter matrices into your model. Instead of updating all 7 billion parameters in Llama-2-7B, LoRA only updates ~200K parameters in small matrices, saving memory and time.

How it works mechanically: The LoraConfig object from PEFT specifies: which layers get adapters (target_modules), the rank of the low-rank matrices (r, typically 8–32), the alpha scaling factor (lora_alpha), and dropout regularization. When you pass this config to get_peft_model(), PEFT injects these tiny matrices into your model's attention and MLP layers. During training, only these adapter weights update; the original model weights stay frozen.

When to use defaults: Start with the default config. Only tune r and lora_alpha if you hit memory limits or see training instability. The defaults were chosen to work on 24GB GPUs for most 7B–13B models.

Analogy

LoRA is like editing a Wikipedia article with a sticky note attached to each page instead of rewriting the entire page. The sticky note (adapter) is tiny and cheap to write, but it modifies how readers interpret the original text when both are read together.

Code

python

from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM

# Default LoRA configuration
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

print("LoRA Config:")
print(f"  Rank (r): {lora_config.r}")
print(f"  Alpha: {lora_config.lora_alpha}")
print(f"  Target modules: {lora_config.target_modules}")
print(f"  Dropout: {lora_config.lora_dropout}")
print(f"  Bias: {lora_config.bias}")

# Load a tiny model to demonstrate
model = AutoModelForCausalLM.from_pretrained(
    "gpt2",
    trust_remote_code=True
)

# Apply LoRA to the model
peft_model = get_peft_model(model, lora_config)

print(f"\nOriginal model parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"Trainable parameters after LoRA: {sum(p.numel() for p in peft_model.parameters() if p.requires_grad):,}")
print(f"Total PEFT model parameters: {sum(p.numel() for p in peft_model.parameters()):,}")
print(f"\nMemory reduction: {(1 - sum(p.numel() for p in peft_model.parameters() if p.requires_grad) / sum(p.numel() for p in model.parameters())) * 100:.1f}% of original trainable params")

Output

LoRA Config:
  Rank (r): 8
  Alpha: 16
  Target modules: ['q_proj', 'v_proj']
  Dropout: 0.05
  Bias: none

Original model parameters: 124,439,808
Trainable parameters after LoRA: 147,456
Total PEFT model parameters: 124,439,808

Memory reduction: 99.9% of original trainable params

What just happened?

We created a LoRA configuration with default parameters, then applied it to GPT-2 using <code>get_peft_model()</code>. This wrapped the original model's attention projection layers with tiny adapter matrices. The original 124M parameters remain frozen; only 147K adapter parameters become trainable. This means your forward and backward passes train almost 1000x fewer parameters, fitting on consumer GPUs.

Common gotcha

Developers often forget that target_modules must match the actual layer names in your specific model. For Llama use ["q_proj", "v_proj", "k_proj", "o_proj"]; for Mistral or newer models it's sometimes different. If you get 'target_modules not found' at runtime, print your model's layer names with print(model.named_parameters()) first.

Error recovery

ValueError: 'target_modules' not found in model

Your target module names don't match the model architecture. Run <code>for name, _ in model.named_parameters(): print(name)</code>, find the attention layer names, and update <code>target_modules</code> to match. For GPT2 it's 'c_attn'; for Llama it's 'q_proj', 'v_proj', etc.

RuntimeError: Expected all tensors to be on the same device

Your PEFT model is on CPU but data is on GPU (or vice versa). After creating the PEFT model with <code>peft_model = get_peft_model(model, lora_config)</code>, call <code>peft_model.to('cuda')</code> to match your batch device.

AttributeError: 'LoraConfig' object has no attribute 'xyz'

You're using an outdated PEFT version (<0.11). Upgrade with <code>pip install --upgrade peft</code>. The current API in peft 0.11.x uses only the documented config attributes.

Experienced dev note

The lora_alpha parameter is the sneaky one. It's not just a scaling factor: it controls the effective learning rate of the LoRA adapters relative to the base model. Set lora_alpha = 2 * r as a rule of thumb (so if r=8, use lora_alpha=16). If your loss doesn't move, bump lora_alpha up; if it's chaotic, lower it. This one parameter saves you from 'why isn't my model learning' debugging hell.

Check your understanding

Your training loop crashes with 'CUDA out of memory' on a 24GB GPU. You adjust the LoRA config from r=16 to r=8, and now it works. Why did this fix the problem, and why didn't you need to change the learning rate?

Show answer hint

The correct answer must explain that lowering <code>r</code> reduces the number of trainable parameters (and thus the size of optimizer states like momentum buffers in Adam), lowering peak memory. The learning rate doesn't need to change because memory failure is a parameter count issue, not a convergence issue. (A common wrong answer: 'smaller r trains faster': it doesn't, it just uses less memory.)

VERSION PEFT 0.11.x changed the config parameter from target_modules: Optional[List[str]] to accepting string patterns for newer models. If using PEFT < 0.11.0, list module names explicitly. Verify your version with import peft; print(peft.__version__).

Next, you'll pass this LoRA config to <code>SFTTrainer</code> and see how it actually applies these adapters during training: we'll load real training data and watch the adapter matrices update.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.