Code Beginner easy · 4 min

In-place operations: underscore suffix

What you will learn
PyTorch uses trailing underscores to mark operations that modify tensors directly instead of creating new ones.

Why this matters

In-place operations save memory and improve performance, but they can silently break gradient computation and create bugs if misused. Understanding when to use them is critical for writing efficient training loops without losing gradients.

Skip if: Never use in-place operations on tensors that require gradients during training (requires_grad=True), especially in loss computation. Avoid them during backpropagation when the original tensor values are needed for gradient calculation. Do not use in-place ops on intermediate tensors in a computation graph unless you are absolutely certain no gradients will flow through them.

Explanation

In-place operations are PyTorch functions with a trailing underscore that modify a tensor's values directly, rather than returning a new tensor. For example, tensor.add_(5) adds 5 to every element in tensor and modifies tensor itself, whereas tensor.add(5) returns a new tensor with the result. Mechanically, in-place ops write directly to the tensor's underlying memory buffer (the storage), which is faster and uses less RAM than allocating and copying data to a new tensor. However, this creates a hidden cost: if a tensor is part of a computation graph that will compute gradients, modifying it in-place can overwrite values needed for the backward pass, causing gradients to be computed incorrectly or raising an error. PyTorch's autograd engine tracks which tensors were modified, and it will raise an error if you modify a tensor in-place that has requires_grad=True and is used in gradient computation.

Analogy

Think of a register in a CPU: reading from and writing to the same register (in-place) is fast because the data never leaves the chip. Allocating a new register and copying data is like writing to main memory: slower but safer. You optimize the register approach when you know nobody else is watching what happens inside that register.

Code

python
import torch

# Create a tensor with gradient tracking enabled
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
print(f"Original x: {x}")
print(f"x requires_grad: {x.requires_grad}")

# Non-in-place operation: creates a new tensor, safe for gradients
y = x + 5
print(f"\nAfter y = x + 5: {y}")
print(f"x unchanged: {x}")

# In-place operation on a tensor WITHOUT requires_grad: always safe
a = torch.tensor([1.0, 2.0, 3.0])
print(f"\nBefore a.add_(5): {a}")
a.add_(5)
print(f"After a.add_(5): {a}")
print(f"a is the same object: {id(a) == id(a)}")

# Attempting in-place on requires_grad=True tensor during backward: raises error
x2 = torch.tensor([2.0, 3.0], requires_grad=True)
loss = (x2 * 2).sum()
print(f"\nBefore backward, x2: {x2}")
try:
    x2.add_(10)  # Modify in-place after loss creation
    loss.backward()  # This will fail
except RuntimeError as e:
    print(f"Error caught: {type(e).__name__}")
    print(f"Message: in-place operation on requires_grad=True tensor detected")

# Safe in-place on requires_grad=True if no backward needed afterward
x3 = torch.tensor([1.0, 2.0], requires_grad=True)
output = x3 * 3
output.backward(torch.ones_like(output))
print(f"\nGradients computed: {x3.grad}")
x3.add_(100)  # Safe to modify after backward
print(f"x3 after in-place add: {x3}")
Output
Original x: tensor([1., 2., 3.], requires_grad=True)
x requires_grad: True

After y = x + 5: tensor([6., 7., 8.], grad_fn=<AddBackward0>)
x unchanged: tensor([1., 2., 3.], requires_grad=True)

Before a.add_(5): tensor([1., 2., 3.])
After a.add_(5): tensor([6., 7., 8.])
a is the same object: True

Before backward, x2: tensor([2., 3.], requires_grad=True)
Error caught: RuntimeError
Message: in-place operation on requires_grad=True tensor detected

Gradients computed: tensor([3., 3.])
x3 after in-place add: tensor([101., 102.])

What just happened?

The code demonstrated the difference between in-place (<code>add_</code>) and non-in-place (<code>add</code>) operations. With <code>requires_grad=False</code>, in-place ops modify the tensor directly and reuse the same memory (confirmed by the id check always returning True). With <code>requires_grad=True</code>, PyTorch's autograd system detects in-place modifications before backward and raises a RuntimeError to prevent silent gradient corruption. The final example shows that in-place operations are safe after <code>.backward()</code> completes, because gradients have already been computed.

Common gotcha

The most common mistake: using in-place operations on intermediate tensors inside a loss computation. For example, doing hidden = hidden.relu_() inside a forward pass, then trying to backpropagate. The error message is cryptic: 'a leaf variable that requires grad is being used in an in-place operation': and developers often blame their loss function instead of finding the hidden in-place op buried several layers up the call stack. Always search for _ suffixes in your forward pass if you get this error.

Error recovery

RuntimeError: a leaf variable that requires grad is being used in an in-place operation
You called an in-place operation (suffix _) on a tensor with requires_grad=True. Replace <code>x.add_(5)</code> with <code>x = x.add(5)</code> or use non-in-place version: <code>x.add(5)</code> instead.
RuntimeError: one of the variables needed for gradient computation has been modified by an in-place operation
An in-place operation modified a tensor after it was used in loss calculation but before backward. Check all operations between loss creation and backward() for trailing underscores. Remove the underscore or reorder your code so in-place ops happen after backward().

Experienced dev note

In production, the real win of in-place ops isn't the memory saved: it's the clarity. If you see a trailing underscore, you know immediately that the tensor was modified. Use in-place ops liberally on tensors that never touch gradients (e.g., data normalization, metric tracking). But in the forward pass and loss computation, treat underscores as red flags. It's often worth the tiny performance hit to avoid the debugging nightmare of a wrong gradient three layers deep in your model. Also: PyTorch's profiler doesn't usually show the difference: the real savings appear only at batch sizes > 1000 on large models. Don't prematurely optimize with in-place ops; add them only if profiling shows memory is the bottleneck.

Check your understanding

Why does x.add_(5) raise an error when x has requires_grad=True and is part of a loss computation, but x.add(5) does not? What is the autograd system protecting you from?

Show answer hint

A correct answer must explain that in-place operations overwrite the original tensor values needed for the backward pass. Autograd needs those original values to compute gradients via the chain rule, so modifying them in-place would cause incorrect or missing gradients. The non-in-place operation creates a new tensor, leaving the original intact for gradient computation.

VERSION PyTorch 2.0+ enforces in-place operation checks more strictly. Prior to 2.0, some in-place operations on requires_grad=True tensors would silently produce incorrect gradients instead of raising an error. Always upgrade to 2.6+ to catch these bugs early.
NEXT

Gradient computation itself: understanding <code>requires_grad</code> and how to call <code>.backward()</code> to compute gradients for training.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.