In-place operations: underscore suffix
Why this matters
In-place operations save memory and improve performance, but they can silently break gradient computation and create bugs if misused. Understanding when to use them is critical for writing efficient training loops without losing gradients.
Explanation
In-place operations are PyTorch functions with a trailing underscore that modify a tensor's values directly, rather than returning a new tensor. For example, tensor.add_(5) adds 5 to every element in tensor and modifies tensor itself, whereas tensor.add(5) returns a new tensor with the result. Mechanically, in-place ops write directly to the tensor's underlying memory buffer (the storage), which is faster and uses less RAM than allocating and copying data to a new tensor. However, this creates a hidden cost: if a tensor is part of a computation graph that will compute gradients, modifying it in-place can overwrite values needed for the backward pass, causing gradients to be computed incorrectly or raising an error. PyTorch's autograd engine tracks which tensors were modified, and it will raise an error if you modify a tensor in-place that has requires_grad=True and is used in gradient computation.
Analogy
Think of a register in a CPU: reading from and writing to the same register (in-place) is fast because the data never leaves the chip. Allocating a new register and copying data is like writing to main memory: slower but safer. You optimize the register approach when you know nobody else is watching what happens inside that register.
Code
import torch
# Create a tensor with gradient tracking enabled
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
print(f"Original x: {x}")
print(f"x requires_grad: {x.requires_grad}")
# Non-in-place operation: creates a new tensor, safe for gradients
y = x + 5
print(f"\nAfter y = x + 5: {y}")
print(f"x unchanged: {x}")
# In-place operation on a tensor WITHOUT requires_grad: always safe
a = torch.tensor([1.0, 2.0, 3.0])
print(f"\nBefore a.add_(5): {a}")
a.add_(5)
print(f"After a.add_(5): {a}")
print(f"a is the same object: {id(a) == id(a)}")
# Attempting in-place on requires_grad=True tensor during backward: raises error
x2 = torch.tensor([2.0, 3.0], requires_grad=True)
loss = (x2 * 2).sum()
print(f"\nBefore backward, x2: {x2}")
try:
x2.add_(10) # Modify in-place after loss creation
loss.backward() # This will fail
except RuntimeError as e:
print(f"Error caught: {type(e).__name__}")
print(f"Message: in-place operation on requires_grad=True tensor detected")
# Safe in-place on requires_grad=True if no backward needed afterward
x3 = torch.tensor([1.0, 2.0], requires_grad=True)
output = x3 * 3
output.backward(torch.ones_like(output))
print(f"\nGradients computed: {x3.grad}")
x3.add_(100) # Safe to modify after backward
print(f"x3 after in-place add: {x3}") Original x: tensor([1., 2., 3.], requires_grad=True) x requires_grad: True After y = x + 5: tensor([6., 7., 8.], grad_fn=<AddBackward0>) x unchanged: tensor([1., 2., 3.], requires_grad=True) Before a.add_(5): tensor([1., 2., 3.]) After a.add_(5): tensor([6., 7., 8.]) a is the same object: True Before backward, x2: tensor([2., 3.], requires_grad=True) Error caught: RuntimeError Message: in-place operation on requires_grad=True tensor detected Gradients computed: tensor([3., 3.]) x3 after in-place add: tensor([101., 102.])
What just happened?
The code demonstrated the difference between in-place (<code>add_</code>) and non-in-place (<code>add</code>) operations. With <code>requires_grad=False</code>, in-place ops modify the tensor directly and reuse the same memory (confirmed by the id check always returning True). With <code>requires_grad=True</code>, PyTorch's autograd system detects in-place modifications before backward and raises a RuntimeError to prevent silent gradient corruption. The final example shows that in-place operations are safe after <code>.backward()</code> completes, because gradients have already been computed.
Common gotcha
The most common mistake: using in-place operations on intermediate tensors inside a loss computation. For example, doing hidden = hidden.relu_() inside a forward pass, then trying to backpropagate. The error message is cryptic: 'a leaf variable that requires grad is being used in an in-place operation': and developers often blame their loss function instead of finding the hidden in-place op buried several layers up the call stack. Always search for _ suffixes in your forward pass if you get this error.
Error recovery
RuntimeError: a leaf variable that requires grad is being used in an in-place operationRuntimeError: one of the variables needed for gradient computation has been modified by an in-place operationExperienced dev note
In production, the real win of in-place ops isn't the memory saved: it's the clarity. If you see a trailing underscore, you know immediately that the tensor was modified. Use in-place ops liberally on tensors that never touch gradients (e.g., data normalization, metric tracking). But in the forward pass and loss computation, treat underscores as red flags. It's often worth the tiny performance hit to avoid the debugging nightmare of a wrong gradient three layers deep in your model. Also: PyTorch's profiler doesn't usually show the difference: the real savings appear only at batch sizes > 1000 on large models. Don't prematurely optimize with in-place ops; add them only if profiling shows memory is the bottleneck.
Check your understanding
Why does x.add_(5) raise an error when x has requires_grad=True and is part of a loss computation, but x.add(5) does not? What is the autograd system protecting you from?
Show answer hint
A correct answer must explain that in-place operations overwrite the original tensor values needed for the backward pass. Autograd needs those original values to compute gradients via the chain rule, so modifying them in-place would cause incorrect or missing gradients. The non-in-place operation creates a new tensor, leaving the original intact for gradient computation.