RuntimeError
torch.autograd.runtime.RuntimeError
Stack trace
Traceback (most recent call last):
File "train.py", line 45, in <module>
loss.backward() # triggers RuntimeError
RuntimeError: variable needed for gradient computation is None Why it happens
This error occurs because a tensor involved in the computation graph needed for gradient calculation is None or detached. It often happens when an intermediate tensor is overwritten, detached, or not properly tracked by autograd, causing backpropagation to fail.
Detection
Monitor your model's forward pass outputs and ensure all tensors requiring gradients are properly connected to the computation graph before calling backward(). Use assertions to check for None or detached tensors.
Causes & fixes
An intermediate tensor was detached or converted to a numpy array, breaking the computation graph.
Avoid detaching tensors or converting them to numpy before backward; keep all operations within PyTorch tensors that require gradients.
In-place operations modified a tensor needed for gradient computation, invalidating the graph.
Replace in-place operations (e.g., tensor += value) with out-of-place operations (e.g., tensor = tensor + value) to preserve autograd tracking.
A tensor was overwritten with None or a non-tensor value during the forward pass.
Check your forward method to ensure all variables used in loss computation are valid tensors and not None.
Using torch.no_grad() or .detach() incorrectly on tensors needed for gradient calculation.
Only use torch.no_grad() or .detach() on tensors that do not require gradients; ensure tensors involved in loss computation remain attached to the graph.
Code: broken vs fixed
import torch
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x * 2
z = y.detach() # breaks gradient tracking
loss = z.sum()
loss.backward() # RuntimeError: variable needed for gradient is None import os
import torch
os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1' # example env var usage
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x * 2
z = y # removed detach to keep gradient tracking
loss = z.sum()
loss.backward() # works correctly
print(x.grad) # prints tensor([1., 1., 1.]) Workaround
Wrap the backward call in try/except RuntimeError, then inspect intermediate tensors for None or detached status and manually re-construct the graph or avoid detaching.
Prevention
Avoid in-place operations and detaching tensors needed for gradients. Use PyTorch's autograd profiler or hooks to verify the computation graph integrity before backward.