Critical severity intermediate · Fix: 5-15 min

RuntimeError

torch.cuda.OutOfMemoryError

What this error means
RunPod GPU out of memory (OOM) error occurs when your model or batch size exceeds the available GPU memory during execution.

Stack trace

traceback
Traceback (most recent call last):
  File "app.py", line 42, in <module>
    outputs = model(input_tensor)
  File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in forward
    ...
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 8.00 GiB total capacity; 6.50 GiB already allocated; 256.00 MiB free; 6.70 GiB reserved in total by PyTorch)
QUICK FIX
Reduce batch size or switch to a RunPod GPU instance with more memory to immediately fix the OOM error.

Why it happens

This error happens because the GPU memory is insufficient to hold the model parameters, intermediate tensors, or batch data during training or inference. RunPod instances have fixed GPU memory limits, and large models or batch sizes can exceed this capacity, causing CUDA to throw an out of memory error.

Detection

Monitor GPU memory usage with tools like nvidia-smi or RunPod dashboard metrics before running your workload to detect memory saturation and prevent OOM crashes.

Causes & fixes

1

Batch size is too large for the available GPU memory

✓ Fix

Reduce the batch size in your training or inference code to fit within the GPU memory limits.

2

Model size or architecture requires more memory than the RunPod GPU instance provides

✓ Fix

Switch to a RunPod instance with a larger GPU memory capacity or optimize the model to be smaller.

3

Memory fragmentation or leftover tensors not freed during iterative runs

✓ Fix

Explicitly call torch.cuda.empty_cache() between iterations and delete unused tensors to free memory.

4

Loading multiple large models or data on the same GPU simultaneously

✓ Fix

Load only one model at a time or distribute models across multiple GPUs if available.

Code: broken vs fixed

Broken - triggers the error
python
import torch

model = MyLargeModel().cuda()
input_tensor = torch.randn(64, 3, 224, 224).cuda()  # Large batch size
outputs = model(input_tensor)  # This line triggers OOM error
Fixed - works correctly
python
import os
import torch

os.environ['RUNPOD_API_KEY'] = os.environ.get('RUNPOD_API_KEY')  # Use env var for API key

model = MyLargeModel().cuda()
input_tensor = torch.randn(16, 3, 224, 224).cuda()  # Reduced batch size to fix OOM
outputs = model(input_tensor)  # Runs without OOM
print('Inference completed successfully')
Reduced the batch size from 64 to 16 to fit the model and data within the available GPU memory, preventing the CUDA out of memory error.

Workaround

Catch the RuntimeError for CUDA OOM, clear cache with torch.cuda.empty_cache(), reduce batch size dynamically, and retry the operation to avoid crashes.

Prevention

Design your workloads to monitor GPU memory usage and use adaptive batch sizing or model parallelism. Prefer RunPod instances with sufficient GPU memory for your model size and workload.

Python 3.9+ · torch >=1.0.0 · tested on 2.0.x
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.