How to Intermediate · 3 min read

How to configure LoRA with PEFT

Quick answer
Use the peft library to configure LoRA by creating a LoraConfig object specifying parameters like r, lora_alpha, and target_modules, then apply it to a Hugging Face model with get_peft_model. This enables parameter-efficient fine-tuning by injecting low-rank adapters into the model.

PREREQUISITES

  • Python 3.8+
  • pip install transformers peft torch
  • Basic familiarity with Hugging Face Transformers

Setup

Install the required Python packages: transformers for model loading, peft for LoRA configuration, and torch for PyTorch support.

bash
pip install transformers peft torch

Step by step

This example shows how to load a Hugging Face causal language model, configure LoRA with LoraConfig, and apply it using get_peft_model. The model is then ready for fine-tuning with LoRA adapters.

python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType
import torch

# Load base model and tokenizer
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# Configure LoRA
lora_config = LoraConfig(
    r=16,                    # Rank of LoRA matrices
    lora_alpha=32,           # Scaling factor
    target_modules=["q_proj", "v_proj"],  # Modules to apply LoRA
    lora_dropout=0.05,       # Dropout for LoRA layers
    task_type=TaskType.CAUSAL_LM
)

# Apply LoRA to the model
model = get_peft_model(model, lora_config)

# Example input
inputs = tokenizer("Hello, how are you?", return_tensors="pt").to(model.device)

# Forward pass
outputs = model(**inputs)
print("Logits shape:", outputs.logits.shape)
output
Logits shape: torch.Size([1, 6, 32000])

Common variations

  • Use BitsAndBytesConfig with load_in_4bit=True for 4-bit quantization combined with LoRA (QLoRA).
  • Change target_modules to match your model architecture (e.g., ["query_key_value"] for some models).
  • Use task_type=TaskType.SEQ_2_SEQ_LM for encoder-decoder models.
  • For async training or inference, integrate with PyTorch Lightning or Hugging Face Accelerate.

Troubleshooting

  • If you get ModuleNotFoundError, ensure peft is installed correctly.
  • Mismatch in target_modules causes no LoRA layers to be applied; verify module names with model.named_modules().
  • Out of memory errors: try using 4-bit quantization with BitsAndBytesConfig or smaller batch sizes.
  • Ensure your model supports fine-tuning and is loaded with device_map="auto" for efficient GPU usage.

Key Takeaways

  • Use LoraConfig and get_peft_model from peft to enable LoRA fine-tuning.
  • Specify target_modules carefully to match your model's attention projection layers.
  • Combine LoRA with quantization (QLoRA) for memory-efficient training on large models.
  • Always load models with device_map="auto" for optimal GPU utilization.
  • Check module names and installation if LoRA layers are not applied or errors occur.
Verified 2026-04 · meta-llama/Llama-3.1-8B-Instruct
Verify ↗