How to fine-tune Llama 3 with Hugging Face
Quick answer
To fine-tune
Llama 3 with Hugging Face Transformers, load the pretrained model and tokenizer from Hugging Face, prepare your dataset in the correct format, and use the Trainer API or transformers training scripts to train on your data. This process requires setting up a training loop with appropriate hyperparameters and saving the fine-tuned model for inference.PREREQUISITES
Python 3.8+pip install transformers datasets accelerate bitsandbytesAccess to a GPU-enabled environmentFamiliarity with Hugging Face Transformers library
Setup environment
Install the necessary Python packages and set up your environment for fine-tuning Llama 3. Use pip to install transformers, datasets, and accelerate for distributed training support. Also install bitsandbytes for 8-bit optimizers to reduce memory usage.
pip install transformers datasets accelerate bitsandbytes Step by step fine-tuning
Load the pretrained Llama 3 model and tokenizer from Hugging Face Hub, prepare your dataset, and fine-tune using the Trainer API. This example uses a text dataset and fine-tunes with causal language modeling.
import os
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments
# Load tokenizer and model
model_name = "meta-llama/Llama-3-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
# Load dataset (example: wikitext)
dataset = load_dataset("wikitext", "wikitext-2-raw-v1")
# Tokenize function
def tokenize_function(examples):
return tokenizer(examples["text"], truncation=True, max_length=512)
tokenized_datasets = dataset.map(tokenize_function, batched=True, remove_columns=["text"])
# Training arguments
training_args = TrainingArguments(
output_dir="./llama3-finetuned",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
num_train_epochs=3,
weight_decay=0.01,
save_total_limit=2,
save_strategy="epoch",
logging_dir='./logs',
logging_steps=10,
fp16=True
)
# Initialize Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"]
)
# Start training
trainer.train()
# Save the fine-tuned model
trainer.save_model("./llama3-finetuned") output
***** Running training ***** Num examples = 21128 Num Epochs = 3 Instantaneous batch size per device = 4 Total train batch size (w. parallel, distributed) = 4 Gradient Accumulation steps = 1 Total optimization steps = 15846 ... Training completed. Model saved to ./llama3-finetuned
Common variations
- Use
accelerateCLI for distributed multi-GPU training. - Fine-tune with LoRA adapters to reduce GPU memory usage.
- Use
Trainercallbacks for custom evaluation or logging. - Switch to
transformers.Trainerasync methods for streaming logs.
Troubleshooting tips
- If you get CUDA out-of-memory errors, reduce batch size or enable 8-bit optimizers with
load_in_8bit=True. - Ensure your tokenizer matches the model architecture to avoid tokenization errors.
- Use
accelerate configto optimize distributed training setup. - Check Hugging Face model card for any special fine-tuning instructions or license restrictions.
Key Takeaways
- Use Hugging Face Transformers and datasets libraries to fine-tune Llama 3 efficiently.
- Enable 8-bit loading and mixed precision to reduce GPU memory usage during fine-tuning.
- Prepare your dataset with proper tokenization matching the Llama 3 tokenizer.
- Leverage the Trainer API for streamlined training and evaluation workflows.
- Use accelerate for multi-GPU or distributed training setups.