Critical severity intermediate · Fix: 5-15 min

RuntimeError

torch.cuda.OutOfMemoryError

What this error means

PyTorch throws a CUDA Out of Memory error when the GPU memory is insufficient to allocate tensors during large model training.

Stack trace

traceback

RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 8.00 GiB total capacity; 6.50 GiB already allocated; 256.00 MiB free; 6.70 GiB reserved in total by PyTorch)

QUICK FIX

Reduce batch size and call torch.cuda.empty_cache() before training to immediately free GPU memory.

Why it happens

This error occurs because the GPU does not have enough free memory to allocate the tensors required for the model or batch during training. Large models, big batch sizes, or memory fragmentation can cause this. PyTorch attempts to allocate memory but fails when the requested size exceeds available GPU memory.

Detection

Monitor GPU memory usage with tools like nvidia-smi during training and catch RuntimeError exceptions with 'CUDA out of memory' in the message to detect imminent failures.

Causes & fixes

Batch size is too large for the available GPU memory

✓ Fix

Reduce the batch size to fit within the GPU memory limits.

Model architecture is too large or uses excessive memory

✓ Fix

Use a smaller model or optimize the model architecture to reduce memory footprint.

GPU memory fragmentation from previous allocations

✓ Fix

Restart the training process or clear GPU cache using torch.cuda.empty_cache() before training.

Not using mixed precision training to reduce memory usage

✓ Fix

Enable mixed precision training with torch.cuda.amp to lower memory consumption.

Code: broken vs fixed

Broken - triggers the error

python

import torch
model = MyLargeModel().cuda()
optimizer = torch.optim.Adam(model.parameters())
batch_size = 128  # Too large for GPU memory
for data in dataloader:
    inputs = data.to('cuda')
    outputs = model(inputs)  # RuntimeError: CUDA out of memory here
    loss = loss_fn(outputs, targets)
    loss.backward()
    optimizer.step()

Fixed - works correctly

python

import os
import torch
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
model = MyLargeModel().cuda()
optimizer = torch.optim.Adam(model.parameters())
batch_size = 32  # Reduced batch size to fit GPU memory
torch.cuda.empty_cache()  # Clear cache before training
for data in dataloader:
    inputs = data.to('cuda')
    with torch.cuda.amp.autocast():  # Enable mixed precision
        outputs = model(inputs)
        loss = loss_fn(outputs, targets)
    loss.backward()
    optimizer.step()
print('Training started successfully with reduced memory usage')

Reduced batch size to fit GPU memory, cleared cache with torch.cuda.empty_cache(), and enabled mixed precision training to lower memory usage and prevent OOM errors.

⚠

Workaround

Catch the RuntimeError exception, call torch.cuda.empty_cache(), reduce batch size dynamically, and retry the training step to temporarily avoid crashes.

✓

Prevention

Design models and training loops with memory profiling, use mixed precision training, gradient checkpointing, and monitor GPU memory to prevent out of memory errors.

Python 3.7+ · torch >=1.0.0 · tested on 2.0.x

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.