How to merge LoRA weights with base model
Quick answer
To merge
LoRA weights with a base model, load both the base model and the LoRA adapter, then apply the LoRA weights to the base model parameters and save the merged model. This process creates a standalone model with the fine-tuned weights integrated, eliminating the need for separate adapters during inference.PREREQUISITES
Python 3.8+pip install transformers>=4.30.0pip install peft>=0.3.0Basic knowledge of Hugging Face Transformers and LoRA
Setup
Install the required libraries transformers and peft which support LoRA fine-tuning and merging.
pip install transformers>=4.30.0 peft>=0.3.0 Step by step
This example shows how to load a base model and LoRA weights, merge them, and save the merged model locally.
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import os
# Load base model and tokenizer
base_model_name = "huggyllama/llama-7b"
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
base_model = AutoModelForCausalLM.from_pretrained(base_model_name)
# Load LoRA adapter
lora_model_path = "./lora_adapter"
lora_model = PeftModel.from_pretrained(base_model, lora_model_path)
# Merge LoRA weights into base model
merged_model = lora_model.merge_and_unload()
# Save merged model
output_dir = "./merged_model"
os.makedirs(output_dir, exist_ok=True)
merged_model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
print(f"Merged model saved to {output_dir}") output
Merged model saved to ./merged_model
Common variations
- Use
AutoModelForSeq2SeqLMfor encoder-decoder models. - Merge asynchronously by wrapping code in async functions with
asyncio. - Use different base models or LoRA adapters by changing the model names or paths.
Troubleshooting
- If you get a mismatch error, ensure the base model and LoRA adapter are compatible versions.
- If
merge_and_unload()is missing, updatepeftto the latest version. - Check that the LoRA adapter path is correct and contains the adapter weights.
Key Takeaways
- Use the
peftlibrary'smerge_and_unload()method to combine LoRA weights with the base model. - Always save the merged model and tokenizer for standalone inference without adapters.
- Ensure base model and LoRA adapter compatibility to avoid errors during merging.