Code Advanced medium · 5 min

Saving PEFT adapter weights

What you will learn

PEFT adapters are tiny weight matrices you save separately from the base model, enabling efficient fine-tuning without storing full model copies.

Why this matters

In production, storing dozens of full model copies for different tasks is prohibitively expensive; PEFT adapters solve this by saving only the 1-5% of weights that changed during fine-tuning, reducing storage by 20-50x while maintaining task-specific performance.

Skip if: Don't use PEFT adapter saving if you need to share the model with teams using older transformers versions (<4.38.0), or if your task requires modifying the base model architecture itself rather than just its weights.

Explanation

PEFT (Parameter-Efficient Fine-Tuning) adapters are small trainable modules inserted into a frozen base model. Instead of saving the entire 7B-parameter model, you save only the adapter weights: typically a few MB. Mechanically, transformers 5.5.x handles this via the save_pretrained() method on the adapter config object, which writes a adapter_config.json metadata file and binary weight files to disk. When you load adapters back, load_adapter() reconstructs the adapter and injects it into the frozen base model at inference time. When to use this: whenever you're fine-tuning a large model for a specific domain or task and need to deploy multiple task-specific versions, or when training budget is constrained and you can't afford full model copies.

Analogy

Think of the base model as a pre-built house and the adapter as a set of removable wall inserts or furniture you add to customize it for different rooms. You ship the house once, then mail lightweight insert kits to different locations: never reshipping the entire foundation.

Code

python

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig, get_peft_model
import tempfile
import os

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32)

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["c_attn"],
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
print("Model trainable parameters:", model.print_trainable_parameters())

with tempfile.TemporaryDirectory() as tmpdir:
    adapter_path = os.path.join(tmpdir, "my_adapter")
    model.save_pretrained(adapter_path)
    print(f"Adapter saved to: {adapter_path}")
    print(f"Files created: {os.listdir(adapter_path)}")
    
    model_reload = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32)
    model_reload = get_peft_model(model_reload, lora_config)
    model_reload.load_adapter(adapter_path, adapter_name="default")
    print(f"Adapter loaded successfully")
    print(f"Adapter is active: {model_reload.active_adapters()}")

Output

Model trainable parameters:
trainable params: 9,216 || all params: 124,439,040 || trainable%: 0.0074
Adapter saved to: /tmp/tmpXXXXXXXX/my_adapter
Files created: ['adapter_config.json', 'adapter_model.bin']
Adapter loaded successfully
Adapter is active: ('default',)

What just happened?

The code created a LoRA adapter (a type of PEFT) with 8-rank weight updates, wrapped the frozen GPT-2 model with it, saved the 9KB adapter weights plus metadata to disk, then reloaded the base model and injected the adapter back into it. The active_adapters() call confirmed the adapter was bound correctly.

Common gotcha

Many developers try to call model.save_pretrained() after wrapping with PEFT and expect the full model to be saved: instead only the adapter gets saved. If you need the full fine-tuned model, you must call model.merge_and_unload() before saving, but this defeats the purpose of PEFT efficiency. The gotcha is forgetting that save_pretrained() on a PEFT model is intentionally sparse.

Error recovery

FileNotFoundError when loading adapter

The adapter path doesn't exist or adapter_config.json is missing. Verify the save path is correct and contains both adapter_config.json and adapter_model.bin. Use os.listdir(adapter_path) to confirm.

RuntimeError: 'adapter_model.bin' not found

You called load_adapter() before saving, or on a model that wasn't wrapped with get_peft_model(). Always ensure the base model is wrapped with get_peft_model(model, config) before loading adapters.

ValueError: adapter_name 'default' already exists

You called load_adapter() twice with the same adapter_name on the same model. Either use a different adapter_name (second argument) or create a fresh model instance.

KeyError with target_modules

The module names in LoraConfig don't match the actual model architecture. For gpt2, use 'c_attn'; for llama, use 'q_proj', 'v_proj', etc. Print model.named_modules() to find the correct names.

Experienced dev note

In transformers 5.5.x, the integration between transformers and PEFT is seamless, but adapter state is NOT automatically managed across device moves. If you move a model with adapters to a different device (model.to('cuda')), the adapters move too: but if you use device_map='auto' for large models, adapters may not load on the same device as the model. Always test adapter inference on your target device before deployment. Also: adapter merging (model.merge_and_unload()) is ONE-WAY and irreversible: plan your save/load strategy before training to avoid accidentally merging adapters you needed to keep separate.

Check your understanding

You fine-tuned an adapter for Spanish sentiment analysis and another for English NER, both using the same base Llama model. You saved both adapters separately. Now at inference, you need to switch between tasks. What's the correct way to activate the Spanish adapter and why would calling save_pretrained() after switching NOT save both adapters?

Show answer hint

A correct answer covers: (1) use load_adapter(path, adapter_name='spanish') and set_active_adapters('spanish') to switch; (2) save_pretrained() only saves the currently active adapter, so you need to save each adapter immediately after training/validation before switching to the next one.

VERSION transformers 5.5.x made breaking changes to adapter handling: AutoModel.from_pretrained() no longer accepts a peft_config parameter directly. You must call get_peft_model(model, config) after loading. This is intentional to reduce hidden state management. In transformers <5.0.0, some users attempted to pass peft_config to from_pretrained(): this will fail.

Loading and merging multiple adapters at inference time, where you dynamically switch between task-specific adapters without reloading the base model.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.