High severity intermediate · Fix: 5-15 min

RuntimeError

torch._C._RuntimeError

What this error means

PyTorch training fails with RuntimeError due to NaN values in gradients causing unstable or invalid backpropagation.

Stack trace

traceback

Traceback (most recent call last):
  File "train.py", line 45, in <module>
    loss.backward()  # RuntimeError: grad can be NaN or Inf
RuntimeError: Function 'SomeFunctionBackward' returned nan values in its 0th output.

QUICK FIX

Add gradient clipping with torch.nn.utils.clip_grad_norm_ and reduce learning rate to prevent NaN gradients immediately.

Why it happens

During backpropagation, gradients can become NaN if the model outputs invalid values (like Inf or NaN), or if operations cause numerical instability such as division by zero or exploding gradients. This corrupts the gradient computation and halts training.

Detection

Monitor gradients and loss values during training using hooks or logging; detect NaNs early by checking tensor.isnan() or tensor.isinf() after backward calls.

Causes & fixes

Exploding gradients cause very large values that overflow to NaN during backward pass.

✓ Fix

Apply gradient clipping using torch.nn.utils.clip_grad_norm_ or clip_grad_value_ to keep gradients within a stable range.

Invalid input data or labels (e.g., NaNs or Infs) propagate through the model causing NaN loss and gradients.

✓ Fix

Validate and clean input tensors before training; use torch.isnan() and torch.isinf() to filter or replace invalid values.

Numerical instability in model operations such as division by zero, log of zero, or sqrt of negative values.

✓ Fix

Add small epsilon values to denominators or inputs to log/sqrt functions; use stable implementations like torch.nn.functional.softplus instead of ReLU if needed.

Learning rate too high causing parameter updates to diverge and produce NaNs.

✓ Fix

Reduce the learning rate and use learning rate schedulers to stabilize training.

Code: broken vs fixed

Broken - triggers the error

python

import torch

model = MyModel()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)

for data, target in dataloader:
    optimizer.zero_grad()
    output = model(data)
    loss = loss_fn(output, target)
    loss.backward()  # RuntimeError: grad can be NaN or Inf
    optimizer.step()

Fixed - works correctly

python

import os
import torch

os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'  # example env var if needed

model = MyModel()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)  # Reduced LR

for data, target in dataloader:
    optimizer.zero_grad()
    output = model(data)
    loss = loss_fn(output, target)
    loss.backward()
    torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)  # Added gradient clipping
    optimizer.step()

print("Training step completed without NaN gradients.")

Reduced learning rate and added gradient clipping to prevent exploding gradients causing NaNs during backward pass.

⚠

Workaround

Wrap loss.backward() in try/except RuntimeError, catch NaN errors, skip optimizer.step() for that batch, and log inputs for offline debugging.

✓

Prevention

Use gradient clipping, validate inputs for NaNs/Infs, apply stable numerical operations, and tune learning rate to maintain stable gradients throughout training.

Python 3.9+ · torch >=1.0.0 · tested on 2.0.x

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.