Critical severity intermediate · Fix: 5-15 min

RuntimeError

torch.cuda.OutOfMemoryError

What this error means
The CUDA Out Of Memory error occurs when the GPU memory is insufficient to run HuggingFace's SFTTrainer during fine-tuning.

Stack trace

traceback
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 8.00 GiB total capacity; 6.50 GiB already allocated; 256.00 MiB free; 6.75 GiB reserved in total by PyTorch)
QUICK FIX
Reduce batch size and enable gradient checkpointing in SFTTrainer to immediately lower GPU memory usage.

Why it happens

This error happens because the model, batch size, or sequence length requires more GPU memory than available. HuggingFace's SFTTrainer attempts to allocate memory for gradients and activations during fine-tuning, which can exceed the GPU capacity if not properly configured.

Detection

Monitor GPU memory usage with tools like nvidia-smi during training; catching RuntimeError with 'cuda out of memory' message indicates this issue before a crash.

Causes & fixes

1

Batch size is too large for the available GPU memory

✓ Fix

Reduce the batch size in your training arguments to fit within GPU memory limits.

2

Sequence length or model size is too large for the GPU

✓ Fix

Lower the max sequence length or switch to a smaller model architecture to reduce memory usage.

3

No gradient checkpointing enabled, causing high memory consumption

✓ Fix

Enable gradient checkpointing in the Trainer to trade compute for memory and reduce peak usage.

4

Multiple processes or other applications occupying GPU memory

✓ Fix

Ensure no other GPU processes are running and clear GPU memory before starting training.

Code: broken vs fixed

Broken - triggers the error
python
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir='./results',
    per_device_train_batch_size=16,  # Too large, causes OOM
    num_train_epochs=3
)
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset
)
trainer.train()  # This line triggers CUDA OOM error
Fixed - works correctly
python
import os
from transformers import Trainer, TrainingArguments

os.environ['CUDA_VISIBLE_DEVICES'] = '0'  # Use single GPU

training_args = TrainingArguments(
    output_dir='./results',
    per_device_train_batch_size=4,  # Reduced batch size to fix OOM
    num_train_epochs=3,
    gradient_checkpointing=True  # Enable gradient checkpointing to save memory
)
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset
)
trainer.train()  # Fixed: runs without CUDA OOM error
print('Training completed successfully')
Reduced batch size and enabled gradient checkpointing to lower GPU memory usage, preventing the CUDA OOM error during fine-tuning.

Workaround

Catch RuntimeError exceptions during training, then reduce batch size or sequence length dynamically and retry training to avoid OOM crashes.

Prevention

Plan GPU memory usage by profiling model size and batch parameters; use gradient checkpointing and mixed precision training to minimize memory footprint before fine-tuning.

Python 3.9+ · transformers >=4.0.0 · tested on 4.30.x
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.