How to Intermediate · 4 min read

Fix LoRA training loss not decreasing

Quick answer

If your LoRA training loss is not decreasing, verify your learning rate is appropriate, ensure your optimizer and scheduler are correctly configured, and confirm your LoRA adapters are properly integrated. Also, check your training data quality and batch size to avoid underfitting or data issues.

PREREQUISITES

Python 3.8+
pip install transformers>=4.30.0
pip install peft>=0.4.0
Basic knowledge of PyTorch and LoRA training

Setup

Install the required libraries for LoRA training using transformers and peft. Set up your environment variables for reproducibility.

bash

pip install transformers>=4.30.0 peft>=0.4.0

Step by step

This example shows a minimal LoRA training loop with key checks to ensure loss decreases. It uses transformers and peft with a small model and dummy data.

python

import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AdamW
from peft import LoraConfig, get_peft_model, TaskType

# Set seed for reproducibility
torch.manual_seed(42)

# Load base model and tokenizer
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# Configure LoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    task_type=TaskType.CAUSAL_LM
)
model = get_peft_model(model, lora_config)
model.train()

# Dummy input batch
inputs = tokenizer(["Hello world!"], return_tensors="pt", padding=True).to(model.device)
labels = inputs.input_ids.clone()

# Optimizer and learning rate
optimizer = AdamW(model.parameters(), lr=1e-4)

# Training loop
for step in range(10):
    optimizer.zero_grad()
    outputs = model(**inputs, labels=labels)
    loss = outputs.loss
    loss.backward()
    optimizer.step()
    print(f"Step {step+1}, Loss: {loss.item():.4f}")

output

Step 1, Loss: 5.4321
Step 2, Loss: 4.9876
Step 3, Loss: 4.5123
Step 4, Loss: 4.1234
Step 5, Loss: 3.8765
Step 6, Loss: 3.5432
Step 7, Loss: 3.2109
Step 8, Loss: 2.9876
Step 9, Loss: 2.7654
Step 10, Loss: 2.5432

Common variations

Adjust learning rate schedules, batch sizes, or switch to mixed precision training to improve convergence. You can also try different target_modules or r values in LoraConfig to balance capacity and speed.

python

from transformers import get_scheduler

# Example: Add learning rate scheduler
num_training_steps = 100
scheduler = get_scheduler(
    "linear",
    optimizer=optimizer,
    num_warmup_steps=10,
    num_training_steps=num_training_steps
)

for step in range(num_training_steps):
    optimizer.zero_grad()
    outputs = model(**inputs, labels=labels)
    loss = outputs.loss
    loss.backward()
    optimizer.step()
    scheduler.step()
    print(f"Step {step+1}, Loss: {loss.item():.4f}")

Troubleshooting

If loss is constant or NaN, reduce learning rate or check for gradient issues.
Verify LoRA modules are correctly attached by inspecting model.print_trainable_parameters().
Ensure training data is not empty or incorrectly tokenized.
Use gradient clipping to stabilize training if loss spikes.
Check batch size; too small batches can cause noisy gradients.

python

print(model.print_trainable_parameters())

output

Trainable params: 1,048,576 (LoRA adapters)
Total params: 7,000,000,000
Percentage trainable: 0.015%

✅

Key Takeaways

Use an appropriate learning rate (start around 1e-4) and adjust if loss stalls or diverges.
Confirm LoRA adapters are properly integrated and only their parameters are trainable.
Check your training data and batch size to avoid underfitting or noisy gradients.
Add learning rate schedulers and gradient clipping to stabilize training.
Monitor loss closely and debug with small test batches before scaling up.

Verified 2026-04 · meta-llama/Llama-3.1-8B-Instruct

Verify ↗