How to Intermediate · 3 min read

How to merge LoRA weights with base model

Q: How to merge LoRA weights with base model

To merge LoRA weights with a base model, load both the base model and the LoRA adapter, then apply the LoRA weights to the base model parameters and save the merged model. This process creates a standalone model with the fine-tuned weights integrated, eliminating the need for separate adapters during inference.

Quick answer

To merge LoRA weights with a base model, load both the base model and the LoRA adapter, then apply the LoRA weights to the base model parameters and save the merged model. This process creates a standalone model with the fine-tuned weights integrated, eliminating the need for separate adapters during inference.

PREREQUISITES

Python 3.8+
pip install transformers>=4.30.0
pip install peft>=0.3.0
Basic knowledge of Hugging Face Transformers and LoRA

Setup

Install the required libraries transformers and peft which support LoRA fine-tuning and merging.

bash

pip install transformers>=4.30.0 peft>=0.3.0

Step by step

This example shows how to load a base model and LoRA weights, merge them, and save the merged model locally.

python

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import os

# Load base model and tokenizer
base_model_name = "huggyllama/llama-7b"
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
base_model = AutoModelForCausalLM.from_pretrained(base_model_name)

# Load LoRA adapter
lora_model_path = "./lora_adapter"
lora_model = PeftModel.from_pretrained(base_model, lora_model_path)

# Merge LoRA weights into base model
merged_model = lora_model.merge_and_unload()

# Save merged model
output_dir = "./merged_model"
os.makedirs(output_dir, exist_ok=True)
merged_model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

print(f"Merged model saved to {output_dir}")

output

Merged model saved to ./merged_model

Common variations

Use AutoModelForSeq2SeqLM for encoder-decoder models.
Merge asynchronously by wrapping code in async functions with asyncio.
Use different base models or LoRA adapters by changing the model names or paths.

Troubleshooting

If you get a mismatch error, ensure the base model and LoRA adapter are compatible versions.
If merge_and_unload() is missing, update peft to the latest version.
Check that the LoRA adapter path is correct and contains the adapter weights.

✅

Key Takeaways

Use the peft library's merge_and_unload() method to combine LoRA weights with the base model.
Always save the merged model and tokenizer for standalone inference without adapters.
Ensure base model and LoRA adapter compatibility to avoid errors during merging.

Verified 2026-04 · huggyllama/llama-7b, peft

Verify ↗