How to Intermediate · 3 min read

How to load LoRA adapter

Q: How to load LoRA adapter

To load a LoRA adapter, use the PEFT library's LoraConfig and get_peft_model functions with a pretrained Hugging Face model. Initialize your base model, configure LoRA parameters, then wrap the model with get_peft_model to apply the adapter.

Quick answer

To load a LoRA adapter, use the PEFT library's LoraConfig and get_peft_model functions with a pretrained Hugging Face model. Initialize your base model, configure LoRA parameters, then wrap the model with get_peft_model to apply the adapter.

PREREQUISITES

Python 3.8+
pip install transformers peft torch
Hugging Face pretrained model checkpoint

Setup

Install the required libraries: transformers for model loading, peft for LoRA support, and torch for PyTorch backend.

bash

pip install transformers peft torch

Step by step

This example loads a pretrained Hugging Face causal language model and applies a LoRA adapter with typical configuration parameters.

python

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType
import torch

# Load base model and tokenizer
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

# Configure LoRA adapter
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    task_type=TaskType.CAUSAL_LM
)

# Load LoRA adapter onto the base model
model = get_peft_model(model, lora_config)

# Model is now ready with LoRA adapter loaded
print("LoRA adapter loaded successfully")

output

LoRA adapter loaded successfully

Common variations

Use BitsAndBytesConfig to load the base model in 4-bit precision for QLoRA.
Adjust target_modules depending on model architecture (e.g., different projection layers).
For causal language models, set task_type=TaskType.CAUSAL_LM; for sequence classification, use TaskType.SEQ_CLS.
LoRA adapters can be saved and loaded separately using model.save_pretrained() and get_peft_model() with peft_model_id.

Troubleshooting

If you get CUDA out-of-memory errors, try loading the model with 4-bit quantization using BitsAndBytesConfig(load_in_4bit=True).
Ensure target_modules matches your model's layer names; otherwise, LoRA won't apply correctly.
Verify that peft and transformers versions are compatible and up to date.
If loading a LoRA adapter checkpoint, use get_peft_model(model, peft_model_id="path_or_hub_id") to load weights.

✅

Key Takeaways

Use PEFT's LoraConfig and get_peft_model to load LoRA adapters onto Hugging Face models.
Configure target_modules and task_type correctly for your model architecture and task.
Combine LoRA with 4-bit quantization (QLoRA) for efficient fine-tuning on limited hardware.
Save and load LoRA adapters separately from base models for modular fine-tuning workflows.
Keep peft and transformers libraries updated to avoid compatibility issues.

Verified 2026-04 · meta-llama/Llama-3.1-8B-Instruct

Verify ↗