RuntimeError
torch.cuda.OutOfMemoryError (RuntimeError)
Stack trace
RuntimeError: CUDA out of memory. Tried to allocate XX GiB (GPU 0; XX GiB total capacity; XX GiB already allocated; XX GiB free; XX GiB reserved)
File "/path/to/accelerate/utils.py", line XXX, in device_map_auto
...
File "/path/to/transformers/modeling_utils.py", line XXX, in to
... Why it happens
The accelerate library's device_map='auto' attempts to automatically split and place model layers across available GPUs. If the model size exceeds the combined GPU memory or the automatic heuristic misestimates memory needs, it triggers a CUDA out-of-memory error. This often happens with very large models or limited GPU resources.
Detection
Monitor GPU memory usage before and during model loading with device_map='auto'. Catch RuntimeError exceptions related to CUDA out-of-memory and log memory stats to detect imminent failures.
Causes & fixes
Model size exceeds total available GPU memory across devices
Use a smaller model or switch to CPU or mixed precision (fp16) to reduce memory footprint.
Automatic device map heuristic misestimates layer memory requirements
Manually specify device_map to control layer placement and avoid overloading any single GPU.
No GPU memory fragmentation or reserved memory available for allocation
Restart the Python process to clear GPU memory or use torch.cuda.empty_cache() before loading the model.
Using device_map='auto' with insufficient GPUs or incompatible hardware
Verify GPU availability and compatibility; consider using device_map='balanced' or 'sequential' for better control.
Code: broken vs fixed
from transformers import AutoModelForCausalLM
from accelerate import init_empty_weights
model_name = "big-model"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto') # triggers OOM error import os
from transformers import AutoModelForCausalLM
from accelerate import init_empty_weights
os.environ['CUDA_VISIBLE_DEVICES'] = '0' # limit to one GPU to control memory
model_name = "big-model"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map={'': 0}, torch_dtype='auto') # manual device map and dtype to reduce memory
print("Model loaded successfully") Workaround
Catch the RuntimeError, call torch.cuda.empty_cache(), and retry loading with a smaller model or manual device_map to avoid OOM.
Prevention
Use explicit device_map settings or model parallelism strategies and monitor GPU memory before loading large models to prevent automatic mapping OOM errors.