Model versioning
Why this matters
Fine-tuning experiments iterate quickly: hyperparameter changes, data adjustments, and training runs produce different model weights. Without versioning, you lose track of which checkpoint actually performed best, and you cannot reproduce the exact model that worked in production three weeks ago.
Explanation
What it is: Model versioning means saving a snapshot of your fine-tuned model weights, training configuration, and metadata at specific points: typically after each training run: with a unique identifier so you can load, compare, and revert to any version later.
How it works mechanically: When you call trainer.save_model(), the Hugging Face trainer writes model weights, config files, and tokenizer files to a directory. By naming that directory with a version identifier (epoch number, timestamp, experiment ID), you create a retrievable checkpoint. You then load it back with AutoModel.from_pretrained() pointing to that exact directory path or a model hub repo.
When to use it: Save a checkpoint after every training run, and especially after each epoch if training takes hours. This lets you stop early if you notice overfitting, compare metrics across versions, and always have a fallback if a later experiment degrades performance.
Analogy
Think of it like git commits for your model. Each time you finish an experiment, you 'commit' the weights to a versioned folder. If a later experiment breaks everything, you can check out an earlier version and start from there.
Code
import os
import json
from datetime import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM
from trl import SFTTrainer, SFTConfig
from peft import LoraConfig
import torch
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32)
lora_config = LoraConfig(
r=8,
lora_alpha=32,
lora_dropout=0.1,
bias="none",
task_type="CAUSAL_LM"
)
train_dataset = [
{"text": "The capital of France is Paris. Paris is beautiful."},
{"text": "Machine learning is about learning patterns from data."},
{"text": "Python is a popular programming language for AI."},
]
version_id = datetime.now().strftime("%Y%m%d_%H%M%S")
output_dir = f"./model_versions/v_{version_id}"
metadata_path = f"./model_versions/metadata_{version_id}.json"
os.makedirs("./model_versions", exist_ok=True)
training_config = SFTConfig(
output_dir=output_dir,
num_train_epochs=1,
per_device_train_batch_size=2,
max_seq_length=128,
save_strategy="no",
logging_steps=1,
)
trainer = SFTTrainer(
model=model,
args=training_config,
train_dataset=train_dataset,
peft_config=lora_config,
dataset_text_field="text",
tokenizer=tokenizer,
)
trainer.train()
trainer.save_model(output_dir)
metadata = {
"version_id": version_id,
"timestamp": datetime.now().isoformat(),
"model_name": model_name,
"lora_config": {
"r": lora_config.r,
"lora_alpha": lora_config.lora_alpha,
"lora_dropout": lora_config.lora_dropout,
},
"num_samples": len(train_dataset),
"num_epochs": training_config.num_train_epochs,
}
with open(metadata_path, "w") as f:
json.dump(metadata, f, indent=2)
print(f"Model saved to: {output_dir}")
print(f"Metadata saved to: {metadata_path}")
loaded_model = AutoModelForCausalLM.from_pretrained(output_dir)
print(f"\nModel loaded successfully from {output_dir}")
print(f"Model type: {type(loaded_model)}") Model saved to: ./model_versions/v_20260415_143022 Metadata saved to: ./model_versions/metadata_20260415_143022.json Model loaded successfully from ./model_versions/v_20260415_143022 Model type: <class 'transformers.models.gpt2.modeling_gpt2.GPT2LMHeadModel'>
What just happened?
The trainer fine-tuned the model for 1 epoch, then saved the full model weights and config to a timestamped directory. A JSON metadata file captured the exact hyperparameters and run details. Then the code reloaded the model from that directory to verify the save-and-load cycle works. Each run creates a new versioned folder so nothing overwrites previous experiments.
Common gotcha
Many developers save only the final model and overwrite it on the next run. Then when they realize the previous experiment was actually better, the weights are lost. Always save every checkpoint to a unique directory, never reuse the same output_dir path.
Error recovery
FileNotFoundErrorRuntimeError: Expected all tensors to be on the same devicejson.JSONDecodeError reading metadataExperienced dev note
Save metadata (hyperparameters, loss, eval metrics) alongside the model weights in a structured format. Six months later, you will not remember why v_20250901 was better than v_20250830. A one-line JSON file prevents that pain. Also: use trainer callbacks to auto-save after each epoch, not manual save_model() calls. Callbacks are fire-and-forget and survive interruptions.
Check your understanding
You fine-tuned a model with different LoRA ranks (r=4 and r=8) in two separate runs. Both versions are saved in separate directories. How would you programmatically load both models and confirm they have different LoRA dimensions without relying on directory names?
Show answer hint
Load both models, then inspect the lora_target_modules or state_dict to compare the dimensions of LoRA weight matrices. The metadata JSON or the config.json saved inside each directory will also show the LoRA rank used.