Fix LoRA training loss not decreasing
Quick answer
If your LoRA training loss is not decreasing, verify your learning rate is appropriate, ensure your optimizer and scheduler are correctly configured, and confirm your LoRA adapters are properly integrated. Also, check your training data quality and batch size to avoid underfitting or data issues.
PREREQUISITES
Python 3.8+pip install transformers>=4.30.0pip install peft>=0.4.0Basic knowledge of PyTorch and LoRA training
Setup
Install the required libraries for LoRA training using transformers and peft. Set up your environment variables for reproducibility.
pip install transformers>=4.30.0 peft>=0.4.0 Step by step
This example shows a minimal LoRA training loop with key checks to ensure loss decreases. It uses transformers and peft with a small model and dummy data.
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AdamW
from peft import LoraConfig, get_peft_model, TaskType
# Set seed for reproducibility
torch.manual_seed(42)
# Load base model and tokenizer
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
# Configure LoRA
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
task_type=TaskType.CAUSAL_LM
)
model = get_peft_model(model, lora_config)
model.train()
# Dummy input batch
inputs = tokenizer(["Hello world!"], return_tensors="pt", padding=True).to(model.device)
labels = inputs.input_ids.clone()
# Optimizer and learning rate
optimizer = AdamW(model.parameters(), lr=1e-4)
# Training loop
for step in range(10):
optimizer.zero_grad()
outputs = model(**inputs, labels=labels)
loss = outputs.loss
loss.backward()
optimizer.step()
print(f"Step {step+1}, Loss: {loss.item():.4f}") output
Step 1, Loss: 5.4321 Step 2, Loss: 4.9876 Step 3, Loss: 4.5123 Step 4, Loss: 4.1234 Step 5, Loss: 3.8765 Step 6, Loss: 3.5432 Step 7, Loss: 3.2109 Step 8, Loss: 2.9876 Step 9, Loss: 2.7654 Step 10, Loss: 2.5432
Common variations
Adjust learning rate schedules, batch sizes, or switch to mixed precision training to improve convergence. You can also try different target_modules or r values in LoraConfig to balance capacity and speed.
from transformers import get_scheduler
# Example: Add learning rate scheduler
num_training_steps = 100
scheduler = get_scheduler(
"linear",
optimizer=optimizer,
num_warmup_steps=10,
num_training_steps=num_training_steps
)
for step in range(num_training_steps):
optimizer.zero_grad()
outputs = model(**inputs, labels=labels)
loss = outputs.loss
loss.backward()
optimizer.step()
scheduler.step()
print(f"Step {step+1}, Loss: {loss.item():.4f}") Troubleshooting
- If loss is constant or NaN, reduce learning rate or check for gradient issues.
- Verify LoRA modules are correctly attached by inspecting model.print_trainable_parameters().
- Ensure training data is not empty or incorrectly tokenized.
- Use gradient clipping to stabilize training if loss spikes.
- Check batch size; too small batches can cause noisy gradients.
print(model.print_trainable_parameters()) output
Trainable params: 1,048,576 (LoRA adapters) Total params: 7,000,000,000 Percentage trainable: 0.015%
Key Takeaways
- Use an appropriate learning rate (start around 1e-4) and adjust if loss stalls or diverges.
- Confirm LoRA adapters are properly integrated and only their parameters are trainable.
- Check your training data and batch size to avoid underfitting or noisy gradients.
- Add learning rate schedulers and gradient clipping to stabilize training.
- Monitor loss closely and debug with small test batches before scaling up.