How to set LoRA hyperparameters
Quick answer
Set LoRA hyperparameters such as
r (rank), lora_alpha (scaling factor), lora_dropout (dropout rate), and target_modules (model layers to adapt) to control the low-rank adaptation during fine-tuning. These parameters balance model capacity, training stability, and efficiency when applying LoRA or QLoRA techniques.PREREQUISITES
Python 3.8+pip install transformers peft bitsandbytesBasic knowledge of PyTorch and Hugging Face Transformers
Setup
Install the required libraries for LoRA fine-tuning with Hugging Face Transformers and PEFT:
pip install transformers peft bitsandbytes Step by step
Define and set LoRA hyperparameters in LoraConfig to customize fine-tuning behavior. Key parameters include:
r: Low-rank dimension controlling adaptation capacity.lora_alpha: Scaling factor for LoRA updates.lora_dropout: Dropout rate to regularize LoRA layers.target_modules: List of model submodules to apply LoRA.
Example code below shows how to configure and apply LoRA to a causal language model.
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType
# Load base model and tokenizer
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Set LoRA hyperparameters
lora_config = LoraConfig(
r=16, # Rank of LoRA matrices
lora_alpha=32, # Scaling factor
target_modules=["q_proj", "v_proj"], # Modules to adapt
lora_dropout=0.05, # Dropout rate
task_type=TaskType.CAUSAL_LM
)
# Apply LoRA to the model
model = get_peft_model(model, lora_config)
print("LoRA configuration applied:", lora_config) output
LoRA configuration applied: LoraConfig(r=16, lora_alpha=32, target_modules=['q_proj', 'v_proj'], lora_dropout=0.05, task_type=TaskType.CAUSAL_LM)
Common variations
You can adjust LoRA hyperparameters based on your use case:
- Rank (
r): Increase for more capacity but higher memory; decrease for efficiency. - Alpha (
lora_alpha): Controls update scaling; typically set equal or higher thanr. - Dropout (
lora_dropout): Use 0.0–0.1 to reduce overfitting. - Target modules: Adapt different layers like
"k_proj","q_proj","v_proj", or transformer blocks depending on model architecture. - QLoRA: Combine with 4-bit quantization by loading model with
BitsAndBytesConfig(load_in_4bit=True)for memory-efficient fine-tuning.
| Hyperparameter | Description | Typical values |
|---|---|---|
| r | Rank of low-rank matrices | 8, 16, 32 |
| lora_alpha | Scaling factor for LoRA updates | 16, 32, 64 |
| lora_dropout | Dropout rate for LoRA layers | 0.0 to 0.1 |
| target_modules | Model submodules to adapt | ["q_proj", "v_proj"] or others |
| task_type | Type of task for PEFT | TaskType.CAUSAL_LM, TaskType.SEQ_2_SEQ_LM |
Troubleshooting
If you encounter unstable training or poor convergence:
- Try lowering
rorlora_alphato reduce model complexity. - Increase
lora_dropoutto 0.1 to regularize. - Verify
target_modulesmatch your model's layer names exactly. - For QLoRA, ensure
BitsAndBytesConfigis correctly set for 4-bit loading.
Key Takeaways
- Set
randlora_alphato balance adaptation capacity and training stability. - Use
lora_dropoutto prevent overfitting during fine-tuning. - Specify
target_modulescarefully to adapt relevant model layers. - Combine LoRA with 4-bit quantization (
QLoRA) for memory-efficient fine-tuning. - Adjust hyperparameters iteratively based on training behavior and resource constraints.