Critical severity advanced · Fix: 15-30 min

RuntimeError

torch._C._RuntimeError

What this error means

PyTorch encountered an illegal memory access on the GPU, causing a RuntimeError and crashing the CUDA kernel.

Stack trace

traceback

Traceback (most recent call last):
  File "train.py", line 45, in <module>
    output = model(input_tensor)  # triggers CUDA illegal memory access
RuntimeError: CUDA error: an illegal memory access was encountered

QUICK FIX

Insert torch.cuda.synchronize() before the failing operation to identify the exact error source and fix out-of-bounds indexing.

Why it happens

This error occurs when a CUDA kernel accesses invalid or out-of-bounds GPU memory, often due to indexing errors, race conditions, or corrupted tensors. It can also happen if previous CUDA operations failed silently, leaving the GPU in an invalid state.

Detection

Monitor CUDA error status after kernel launches using torch.cuda.synchronize() and catch RuntimeError exceptions to detect illegal memory access early.

Causes & fixes

Out-of-bounds indexing in custom CUDA kernels or PyTorch operations

✓ Fix

Check all tensor indexing and slicing to ensure indices are within valid ranges; add bounds checks in custom CUDA code.

Use of uninitialized or corrupted GPU tensors

✓ Fix

Initialize all tensors properly before use and verify tensor shapes and device placement before CUDA operations.

Race conditions or improper synchronization between CUDA streams

✓ Fix

Use torch.cuda.synchronize() to enforce proper synchronization and avoid concurrent writes to the same memory.

Previous CUDA errors not cleared, causing cascading failures

✓ Fix

Call torch.cuda.empty_cache() and torch.cuda.synchronize() after catching errors to reset GPU state before retrying.

Code: broken vs fixed

Broken - triggers the error

python

import torch

tensor = torch.randn(10, device='cuda')
index = 15
value = tensor[index]  # RuntimeError: CUDA illegal memory access

Fixed - works correctly

python

import os
import torch

os.environ['CUDA_VISIBLE_DEVICES'] = '0'  # Use first GPU

tensor = torch.randn(10, device='cuda')
index = 9  # Fixed index within bounds
value = tensor[index]  # No error
print(value)

Corrected the tensor index to be within valid bounds to prevent illegal memory access on the GPU.

⚠

Workaround

Wrap CUDA operations in try/except RuntimeError, call torch.cuda.synchronize() to flush errors, and reset GPU state with torch.cuda.empty_cache() before retrying.

✓

Prevention

Use thorough tensor shape validation, proper CUDA synchronization, and test custom CUDA kernels extensively to avoid illegal memory access errors.

Python 3.7+ · torch >=1.0.0 · tested on 2.0.x

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.