How to set training arguments for fine-tuning
Quick answer
Set training arguments for fine-tuning by specifying parameters such as
batch_size, learning_rate, num_train_epochs, and weight_decay in your training script or configuration. These control how the model learns during fine-tuning and are typically passed to frameworks like Hugging Face's Trainer or OpenAI fine-tuning APIs.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install transformers datasets accelerateBasic knowledge of Python and machine learning
Setup
Install the necessary Python libraries for fine-tuning, such as transformers and datasets. Set your OpenAI API key as an environment variable for secure access.
pip install transformers datasets accelerate
# Set your API key in your shell environment
export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"] Step by step
Use Hugging Face's TrainingArguments class to configure training parameters. Below is a complete example showing how to set key training arguments and run fine-tuning on a dataset.
from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments
from datasets import load_dataset
import os
# Load dataset
raw_datasets = load_dataset("glue", "mrpc")
# Load tokenizer and model
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
# Tokenize function
def tokenize_function(examples):
return tokenizer(examples["sentence1"], examples["sentence2"], truncation=True)
# Tokenize datasets
tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
# Define training arguments
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=3,
weight_decay=0.01,
save_strategy="epoch",
logging_dir="./logs",
logging_steps=10
)
# Initialize Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"]
)
# Train model
trainer.train() output
***** Running training ***** Num examples = 3668 Num Epochs = 3 Instantaneous batch size per device = 16 Total train batch size (w. parallel, distributed & accumulation) = 16 Gradient Accumulation steps = 1 Total optimization steps = 687 ... Training completed. Model saved to ./results
Common variations
You can customize training arguments for different scenarios:
- Use
fp16=Truefor mixed precision training to speed up on GPUs. - Adjust
learning_rateandnum_train_epochsbased on dataset size and task complexity. - Use
per_device_train_batch_sizeto fit your GPU memory. - For OpenAI fine-tuning API, specify parameters like
n_epochsandbatch_sizein the fine-tuning request payload.
| Parameter | Description | Example Value |
|---|---|---|
| learning_rate | Step size for optimizer updates | 2e-5 |
| num_train_epochs | Number of passes over the training dataset | 3 |
| per_device_train_batch_size | Batch size per GPU/CPU device | 16 |
| weight_decay | L2 regularization to prevent overfitting | 0.01 |
| fp16 | Enable mixed precision training | True |
Troubleshooting
If you encounter out-of-memory errors, reduce per_device_train_batch_size or enable gradient accumulation. If training is unstable, lower the learning_rate. Always monitor logs for warnings about convergence or overfitting.
Key Takeaways
- Use
TrainingArgumentsto set key fine-tuning parameters like batch size, learning rate, and epochs. - Adjust training arguments based on your hardware and dataset size to optimize performance and stability.
- Enable mixed precision (
fp16) to speed up training on compatible GPUs. - Monitor training logs to catch issues like overfitting or memory errors early.