Code Beginner easy · 5 min

Model versioning

What you will learn

Track and save different versions of your fine-tuned model so you can reproduce results and roll back to working versions.

Why this matters

Fine-tuning experiments iterate quickly: hyperparameter changes, data adjustments, and training runs produce different model weights. Without versioning, you lose track of which checkpoint actually performed best, and you cannot reproduce the exact model that worked in production three weeks ago.

Skip if: You do not need explicit versioning for throwaway prototypes or when you have only one final model you never update. Once you fine-tune more than once, versioning saves hours of debugging.

Explanation

What it is: Model versioning means saving a snapshot of your fine-tuned model weights, training configuration, and metadata at specific points: typically after each training run: with a unique identifier so you can load, compare, and revert to any version later.

How it works mechanically: When you call trainer.save_model(), the Hugging Face trainer writes model weights, config files, and tokenizer files to a directory. By naming that directory with a version identifier (epoch number, timestamp, experiment ID), you create a retrievable checkpoint. You then load it back with AutoModel.from_pretrained() pointing to that exact directory path or a model hub repo.

When to use it: Save a checkpoint after every training run, and especially after each epoch if training takes hours. This lets you stop early if you notice overfitting, compare metrics across versions, and always have a fallback if a later experiment degrades performance.

Analogy

Think of it like git commits for your model. Each time you finish an experiment, you 'commit' the weights to a versioned folder. If a later experiment breaks everything, you can check out an earlier version and start from there.

Code

python

import os
import json
from datetime import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM
from trl import SFTTrainer, SFTConfig
from peft import LoraConfig
import torch

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32)

lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM"
)

train_dataset = [
    {"text": "The capital of France is Paris. Paris is beautiful."},
    {"text": "Machine learning is about learning patterns from data."},
    {"text": "Python is a popular programming language for AI."},
]

version_id = datetime.now().strftime("%Y%m%d_%H%M%S")
output_dir = f"./model_versions/v_{version_id}"
metadata_path = f"./model_versions/metadata_{version_id}.json"

os.makedirs("./model_versions", exist_ok=True)

training_config = SFTConfig(
    output_dir=output_dir,
    num_train_epochs=1,
    per_device_train_batch_size=2,
    max_seq_length=128,
    save_strategy="no",
    logging_steps=1,
)

trainer = SFTTrainer(
    model=model,
    args=training_config,
    train_dataset=train_dataset,
    peft_config=lora_config,
    dataset_text_field="text",
    tokenizer=tokenizer,
)

trainer.train()

trainer.save_model(output_dir)

metadata = {
    "version_id": version_id,
    "timestamp": datetime.now().isoformat(),
    "model_name": model_name,
    "lora_config": {
        "r": lora_config.r,
        "lora_alpha": lora_config.lora_alpha,
        "lora_dropout": lora_config.lora_dropout,
    },
    "num_samples": len(train_dataset),
    "num_epochs": training_config.num_train_epochs,
}

with open(metadata_path, "w") as f:
    json.dump(metadata, f, indent=2)

print(f"Model saved to: {output_dir}")
print(f"Metadata saved to: {metadata_path}")

loaded_model = AutoModelForCausalLM.from_pretrained(output_dir)
print(f"\nModel loaded successfully from {output_dir}")
print(f"Model type: {type(loaded_model)}")

Output

Model saved to: ./model_versions/v_20260415_143022
Metadata saved to: ./model_versions/metadata_20260415_143022.json

Model loaded successfully from ./model_versions/v_20260415_143022
Model type: <class 'transformers.models.gpt2.modeling_gpt2.GPT2LMHeadModel'>

What just happened?

The trainer fine-tuned the model for 1 epoch, then saved the full model weights and config to a timestamped directory. A JSON metadata file captured the exact hyperparameters and run details. Then the code reloaded the model from that directory to verify the save-and-load cycle works. Each run creates a new versioned folder so nothing overwrites previous experiments.

Common gotcha

Many developers save only the final model and overwrite it on the next run. Then when they realize the previous experiment was actually better, the weights are lost. Always save every checkpoint to a unique directory, never reuse the same output_dir path.

Error recovery

FileNotFoundError

You tried to load from an output_dir that doesn't exist or was deleted. Double-check the exact path and ensure the save_model() call completed without error.

RuntimeError: Expected all tensors to be on the same device

The model was saved on one device (GPU) but loaded on another (CPU). Explicitly move the loaded model to your target device: `model.to('cpu')` or `model.to('cuda')`.

json.JSONDecodeError reading metadata

The metadata JSON file was corrupted or truncated. This happens if the process crashed during save. Reconstruct metadata manually from trainer.args or accept that metadata for that version is lost.

Experienced dev note

Save metadata (hyperparameters, loss, eval metrics) alongside the model weights in a structured format. Six months later, you will not remember why v_20250901 was better than v_20250830. A one-line JSON file prevents that pain. Also: use trainer callbacks to auto-save after each epoch, not manual save_model() calls. Callbacks are fire-and-forget and survive interruptions.

Check your understanding

You fine-tuned a model with different LoRA ranks (r=4 and r=8) in two separate runs. Both versions are saved in separate directories. How would you programmatically load both models and confirm they have different LoRA dimensions without relying on directory names?

Show answer hint

Load both models, then inspect the lora_target_modules or state_dict to compare the dimensions of LoRA weight matrices. The metadata JSON or the config.json saved inside each directory will also show the LoRA rank used.

VERSION trl >= 0.12.0 introduced SFTConfig and SFTTrainer in their current form. Earlier versions used different argument structures. If using trl < 0.12.0, trainer instantiation syntax differs: always pin trl version in requirements.txt.

Next, learn how to evaluate your fine-tuned checkpoint against a test set so you can compare versions objectively instead of guessing which one performs best.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.