HuggingFace Hub publishing
Why this matters
Fine-tuning a model is worthless if it only lives on your machine. Publishing to Hub makes your model discoverable, reproducible, and instantly loadable by others: plus it handles version control and automatic tokenizer/config syncing automatically.
Explanation
The HuggingFace Hub is a centralized model repository where you version, share, and collaborate on models. When you push a fine-tuned model, you're uploading not just the weights but also the tokenizer, config.json, and training metadata. This ensures anyone can load your model with a single line: AutoModel.from_pretrained('your_username/model_name').
Mechanically, the push_to_hub method (available on trainers in trl 1.x) handles Git-based versioning behind the scenes. Each push creates a new commit with a timestamp. You can set privacy (public/private), add a model card describing your approach, and even create multiple branches for different fine-tuning runs. The Hub also provides automatic model inference endpoints, making your model accessible via API without you hosting anything.
Use this when you want your fine-tuned work to be production-ready and shareable: after training completes, configure a Hub repo token, set a repo ID, and let the trainer push automatically: or manually push artifacts after inspection. For organizational models, use organization repos instead of personal ones.
Analogy
Publishing to Hub is like shipping your trained model to a public GitHub repo, but with built-in model hosting. Every push is a new version (commit), teammates can download it instantly, and the platform gives you a free inference endpoint: you're not just sharing code, you're sharing a live, loadable model.
Code
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
from trl import SFTTrainer, SFTConfig
from peft import LoraConfig
import torch
from datasets import Dataset
# Login to HuggingFace Hub first (run once)
# from huggingface_hub import login
# login(token="hf_your_token_here")
# Create a minimal training dataset
dataset_dict = {
"text": [
"The quick brown fox jumps over the lazy dog. This is a fine-tuning example.",
"Machine learning models require careful tuning and validation steps.",
"HuggingFace transformers make it easy to fine-tune large language models.",
]
}
dataset = Dataset.from_dict(dataset_dict)
# Configure model and training
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Add padding token for GPT-2
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# LoRA config for efficient fine-tuning
lora_config = LoraConfig(
r=8,
lora_alpha=32,
lora_dropout=0.1,
target_modules=["c_attn", "c_proj"],
bias="none",
task_type="CAUSAL_LM"
)
# Training config WITH hub publishing
sft_config = SFTConfig(
output_dir="./fine_tuned_gpt2",
num_train_epochs=2,
per_device_train_batch_size=2,
max_seq_length=128,
learning_rate=2e-4,
# Hub publishing settings
push_to_hub=True,
hub_model_id="neural-base-demo/gpt2-finetuned",
hub_strategy="every_save",
hub_private_repo=False,
save_strategy="epoch",
save_total_limit=2,
)
# Create trainer
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
args=sft_config,
train_dataset=dataset,
peft_config=lora_config,
dataset_text_field="text",
)
# Train and automatically push to Hub on each save
trainer.train()
# Verify it was pushed
print("Model pushed to Hub at: https://huggingface.co/neural-base-demo/gpt2-finetuned")
# Load your own model from Hub to verify
from transformers import AutoModel
downloaded_model = AutoModel.from_pretrained("neural-base-demo/gpt2-finetuned", trust_remote_code=True)
print("Successfully loaded model from Hub") Model pushed to Hub at: https://huggingface.co/neural-base-demo/gpt2-finetuned Successfully loaded model from Hub
What just happened?
The SFTTrainer initialized with `push_to_hub=True` and `hub_model_id` set. During training, after each epoch (because `save_strategy="epoch"`), the trainer saved the local checkpoint AND pushed it to the Hub repo specified. After training completed, the last model was accessible via `AutoModel.from_pretrained()` using the Hub repo ID. The Hub repo now contains the LoRA adapters, config files, tokenizer, and a Git history of all pushes.
Common gotcha
The most common mistake is forgetting to authenticate before pushing. You must run `huggingface_hub.login(token="hf_...")` once in your environment BEFORE training starts, or set the HF_TOKEN environment variable. If you skip this, the trainer will silently fail to push (no error, just no upload). Second gotcha: if you set `hub_strategy="every_save"` with a small `save_steps`, you'll create hundreds of commits on Hub: use `"end"` or `"every_save"` with large intervals for cleaner history.
Error recovery
RepositoryNotFoundErrorOSError: Can't find a tokenizer config filehuggingface_hub.utils.HfHubHTTPError 401Experienced dev note
Publishing a fine-tuned model is not just about sharing weights: it's about reproducibility. Always include a model card (automatically generated but you should edit it on Hub afterward) describing your dataset, training hyperparameters, and expected performance. Senior teams version their fine-tuning runs by appending `-v1`, `-v2` to the Hub repo ID, not by branches. Also: if you're fine-tuning a massive model (70B+), consider using private repos and limiting `hub_strategy` to `"end"` only: pushing every checkpoint can consume your API quota and take hours per push.
Check your understanding
Your fine-tuned model is now on Hub, but a collaborator loads it and gets different tokenization results than your training logs show. What are two likely causes, and how would you verify which one broke reproducibility?
Show answer hint
A correct answer identifies: (1) the tokenizer.json or special_tokens_map.json wasn't pushed (verify with `huggingface_hub.list_repo_files()`), forcing a download of a different tokenizer version; (2) the tokenizer was pushed but the config.json has different `padding_side` or `truncation_side` settings than training used (check `tokenizer.init_kwargs` before and after loading). Both require inspecting what actually exists in the Hub repo files, not just the model weights.