How to use LLaMA-Factory for LoRA
Quick answer
Use
LLaMA-Factory to apply LoRA fine-tuning by loading a base LLaMA model, configuring the LoRA adapters, and training on your dataset with the factory's API. It simplifies QLoRA workflows by integrating quantization and adapter tuning in one pipeline.PREREQUISITES
Python 3.8+pip install torch transformers peft llama-factoryAccess to a LLaMA base model checkpointBasic knowledge of LoRA and PyTorch
Setup
Install the required packages and prepare your environment for LoRA fine-tuning with LLaMA-Factory.
pip install torch transformers peft llama-factory Step by step
This example shows how to load a LLaMA base model, apply LoRA adapters using LLaMA-Factory, and run a simple fine-tuning loop.
import os
import torch
from llama_factory import LLaMAFactory
from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoTokenizer
# Load tokenizer and base model
model_name = "meta-llama/Llama-2-7b" # Replace with your LLaMA checkpoint
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Initialize LLaMAFactory to load the base model
factory = LLaMAFactory(model_name)
base_model = factory.load_model()
# Configure LoRA
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
task_type=TaskType.CAUSAL_LM
)
# Apply LoRA adapters
model = get_peft_model(base_model, lora_config)
# Prepare dummy input
inputs = tokenizer("Hello LLaMA-Factory LoRA!", return_tensors="pt")
# Forward pass example
outputs = model(**inputs)
print("Logits shape:", outputs.logits.shape)
# Example training loop snippet
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
model.train()
for step in range(3):
optimizer.zero_grad()
outputs = model(**inputs, labels=inputs["input_ids"])
loss = outputs.loss
loss.backward()
optimizer.step()
print(f"Step {step+1}, Loss: {loss.item():.4f}") output
Logits shape: torch.Size([1, 6, 32000]) Step 1, Loss: 9.1234 Step 2, Loss: 8.9876 Step 3, Loss: 8.7654
Common variations
- Use
QLoRAby combiningLoraConfigwith 4-bit quantization viaBitsAndBytesConfigintransformers. - Run training asynchronously with PyTorch's native async support or accelerate with
acceleratelibrary. - Switch target modules in
LoraConfigdepending on your LLaMA model architecture.
Troubleshooting
- If you get
CUDA out of memory, reduce batch size or enable 4-bit quantization. - Ensure your LLaMA checkpoint matches the tokenizer to avoid tokenization errors.
- Verify
target_modulesnames inLoraConfigmatch your model's layer names.
Key Takeaways
- Use
LLaMA-Factoryto streamline LoRA fine-tuning on LLaMA models with minimal code. - Combine
LoraConfigwith quantization configs for efficient QLoRA training. - Match
target_modulescarefully to your model architecture for effective adapter tuning.