Saving PEFT adapter weights
Why this matters
In production, storing dozens of full model copies for different tasks is prohibitively expensive; PEFT adapters solve this by saving only the 1-5% of weights that changed during fine-tuning, reducing storage by 20-50x while maintaining task-specific performance.
Explanation
PEFT (Parameter-Efficient Fine-Tuning) adapters are small trainable modules inserted into a frozen base model. Instead of saving the entire 7B-parameter model, you save only the adapter weights: typically a few MB. Mechanically, transformers 5.5.x handles this via the save_pretrained() method on the adapter config object, which writes a adapter_config.json metadata file and binary weight files to disk. When you load adapters back, load_adapter() reconstructs the adapter and injects it into the frozen base model at inference time. When to use this: whenever you're fine-tuning a large model for a specific domain or task and need to deploy multiple task-specific versions, or when training budget is constrained and you can't afford full model copies.
Analogy
Think of the base model as a pre-built house and the adapter as a set of removable wall inserts or furniture you add to customize it for different rooms. You ship the house once, then mail lightweight insert kits to different locations: never reshipping the entire foundation.
Code
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig, get_peft_model
import tempfile
import os
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32)
lora_config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["c_attn"],
lora_dropout=0.1,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)
print("Model trainable parameters:", model.print_trainable_parameters())
with tempfile.TemporaryDirectory() as tmpdir:
adapter_path = os.path.join(tmpdir, "my_adapter")
model.save_pretrained(adapter_path)
print(f"Adapter saved to: {adapter_path}")
print(f"Files created: {os.listdir(adapter_path)}")
model_reload = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32)
model_reload = get_peft_model(model_reload, lora_config)
model_reload.load_adapter(adapter_path, adapter_name="default")
print(f"Adapter loaded successfully")
print(f"Adapter is active: {model_reload.active_adapters()}") Model trainable parameters:
trainable params: 9,216 || all params: 124,439,040 || trainable%: 0.0074
Adapter saved to: /tmp/tmpXXXXXXXX/my_adapter
Files created: ['adapter_config.json', 'adapter_model.bin']
Adapter loaded successfully
Adapter is active: ('default',) What just happened?
The code created a LoRA adapter (a type of PEFT) with 8-rank weight updates, wrapped the frozen GPT-2 model with it, saved the 9KB adapter weights plus metadata to disk, then reloaded the base model and injected the adapter back into it. The active_adapters() call confirmed the adapter was bound correctly.
Common gotcha
Many developers try to call model.save_pretrained() after wrapping with PEFT and expect the full model to be saved: instead only the adapter gets saved. If you need the full fine-tuned model, you must call model.merge_and_unload() before saving, but this defeats the purpose of PEFT efficiency. The gotcha is forgetting that save_pretrained() on a PEFT model is intentionally sparse.
Error recovery
FileNotFoundError when loading adapterRuntimeError: 'adapter_model.bin' not foundValueError: adapter_name 'default' already existsKeyError with target_modulesExperienced dev note
In transformers 5.5.x, the integration between transformers and PEFT is seamless, but adapter state is NOT automatically managed across device moves. If you move a model with adapters to a different device (model.to('cuda')), the adapters move too: but if you use device_map='auto' for large models, adapters may not load on the same device as the model. Always test adapter inference on your target device before deployment. Also: adapter merging (model.merge_and_unload()) is ONE-WAY and irreversible: plan your save/load strategy before training to avoid accidentally merging adapters you needed to keep separate.
Check your understanding
You fine-tuned an adapter for Spanish sentiment analysis and another for English NER, both using the same base Llama model. You saved both adapters separately. Now at inference, you need to switch between tasks. What's the correct way to activate the Spanish adapter and why would calling save_pretrained() after switching NOT save both adapters?
Show answer hint
A correct answer covers: (1) use load_adapter(path, adapter_name='spanish') and set_active_adapters('spanish') to switch; (2) save_pretrained() only saves the currently active adapter, so you need to save each adapter immediately after training/validation before switching to the next one.