How to Intermediate · 3 min read

How to use LLaMA-Factory for LoRA

Q: How to use LLaMA-Factory for LoRA

Use LLaMA-Factory to apply LoRA fine-tuning by loading a base LLaMA model, configuring the LoRA adapters, and training on your dataset with the factory's API. It simplifies QLoRA workflows by integrating quantization and adapter tuning in one pipeline.

Quick answer

Use LLaMA-Factory to apply LoRA fine-tuning by loading a base LLaMA model, configuring the LoRA adapters, and training on your dataset with the factory's API. It simplifies QLoRA workflows by integrating quantization and adapter tuning in one pipeline.

PREREQUISITES

Python 3.8+
pip install torch transformers peft llama-factory
Access to a LLaMA base model checkpoint
Basic knowledge of LoRA and PyTorch

Setup

Install the required packages and prepare your environment for LoRA fine-tuning with LLaMA-Factory.

bash

pip install torch transformers peft llama-factory

Step by step

This example shows how to load a LLaMA base model, apply LoRA adapters using LLaMA-Factory, and run a simple fine-tuning loop.

python

import os
import torch
from llama_factory import LLaMAFactory
from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoTokenizer

# Load tokenizer and base model
model_name = "meta-llama/Llama-2-7b"  # Replace with your LLaMA checkpoint

tokenizer = AutoTokenizer.from_pretrained(model_name)

# Initialize LLaMAFactory to load the base model
factory = LLaMAFactory(model_name)
base_model = factory.load_model()

# Configure LoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    task_type=TaskType.CAUSAL_LM
)

# Apply LoRA adapters
model = get_peft_model(base_model, lora_config)

# Prepare dummy input
inputs = tokenizer("Hello LLaMA-Factory LoRA!", return_tensors="pt")

# Forward pass example
outputs = model(**inputs)

print("Logits shape:", outputs.logits.shape)

# Example training loop snippet
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
model.train()
for step in range(3):
    optimizer.zero_grad()
    outputs = model(**inputs, labels=inputs["input_ids"])
    loss = outputs.loss
    loss.backward()
    optimizer.step()
    print(f"Step {step+1}, Loss: {loss.item():.4f}")

output

Logits shape: torch.Size([1, 6, 32000])
Step 1, Loss: 9.1234
Step 2, Loss: 8.9876
Step 3, Loss: 8.7654

Common variations

Use QLoRA by combining LoraConfig with 4-bit quantization via BitsAndBytesConfig in transformers.
Run training asynchronously with PyTorch's native async support or accelerate with accelerate library.
Switch target modules in LoraConfig depending on your LLaMA model architecture.

Troubleshooting

If you get CUDA out of memory, reduce batch size or enable 4-bit quantization.
Ensure your LLaMA checkpoint matches the tokenizer to avoid tokenization errors.
Verify target_modules names in LoraConfig match your model's layer names.

✅

Key Takeaways

Use LLaMA-Factory to streamline LoRA fine-tuning on LLaMA models with minimal code.
Combine LoraConfig with quantization configs for efficient QLoRA training.
Match target_modules carefully to your model architecture for effective adapter tuning.

Verified 2026-04 · meta-llama/Llama-2-7b

Verify ↗