.detach(): stopping gradient flow
Why this matters
You need to stop gradient flow when you want to use a tensor's values in your computation without updating its parameters: common in loss calculations, target values, or when freezing part of a model. Without it, you'll train things you meant to keep fixed.
Explanation
What it is: .detach() returns a new tensor that shares the same data as the original, but is no longer part of PyTorch's autograd graph. Gradients will not flow backward through it.
How it works mechanically: When you call loss.backward(), PyTorch traces back through every operation in the computational graph, computing gradients. If a tensor is detached, that backward trace stops: no gradient is computed for operations that depend only on the detached tensor. The tensor's values are still usable for forward computation; only the gradient tracking is severed.
When to use it: Use .detach() when you have a tensor whose values matter for computation but whose parameters should never be updated. Classic cases: target values in supervised learning (we want to predict them, not adjust them), or when you're implementing a two-network architecture and only want to update one network's weights.
Analogy
Imagine a student solving a math problem with a calculator. The calculator gives a number (the tensor's value). Normally, if the answer is wrong, you debug both the student's logic and the calculator (backprop updates both). With <code>.detach()</code>, you cut off the cable to the calculator: its number still appears in the work, but blame for errors never flows back to fix it.
Code
import torch
# Create a simple tensor that requires gradients
x = torch.tensor([2.0, 3.0], requires_grad=True)
print(f"Original x: {x}")
print(f"Original x.requires_grad: {x.requires_grad}")
# Detach the tensor
x_detached = x.detach()
print(f"\nDetached x_detached: {x_detached}")
print(f"Detached x_detached.requires_grad: {x_detached.requires_grad}")
# Compute a loss using the detached tensor
y = x.sum() # This WILL accumulate gradients
z = x_detached.sum() # This will NOT accumulate gradients
loss = y + z
print(f"\nLoss: {loss}")
# Backpropagate
loss.backward()
print(f"\nGradient of x after backward: {x.grad}")
print("Note: gradient exists because y = x.sum() kept the connection")
print("z = x_detached.sum() did not contribute to x's gradient") Original x: tensor([2., 3.], requires_grad=True) Original x.requires_grad: True Detached x_detached: tensor([2., 3.]) Detached x_detached.requires_grad: False Loss: tensor(10., grad_fn=<AddBackward0>) Gradient of x after backward: tensor([2., 2.]) Note: gradient exists because y = x.sum() kept the connection z = x_detached.sum() did not contribute to x's gradient
What just happened?
We created a tensor with gradient tracking enabled. When we called <code>.detach()</code>, we got a new tensor with the same values but <code>requires_grad=False</code> and no connection to the autograd graph. Both tensors were used in the loss, but only the original <code>x</code> received gradients during backprop because only <code>y = x.sum()</code> kept the computational graph intact. The detached operation <code>z = x_detached.sum()</code> contributed to the loss value but not to any gradient computation.
Common gotcha
The most common mistake: assuming .detach() creates a completely independent copy. It doesn't: it shares the same underlying data. If you modify the detached tensor's values in-place, the original changes too. More importantly, developers often think .detach() makes a copy when they actually need .clone().detach() to get a true independent copy that won't affect the original.
Error recovery
RuntimeError: 'NoneType' object is not subscriptable when accessing gradRuntimeError: element 0 of tensors does not require grad and does not have a grad_fnExperienced dev note
In production, .detach() is how you implement target networks in reinforcement learning (compute loss against a frozen copy of your network), freeze pre-trained layers during transfer learning, or implement custom loss functions that mix learnable and fixed components. The alternative: with torch.no_grad():: is for a different use case: it disables gradient tracking for an entire block of code (useful for inference), whereas .detach() is surgical: it marks one specific tensor as non-learnable while the surrounding code still tracks gradients. Learn to distinguish them early; mixing them up is a hidden cause of training failures.
Check your understanding
If you compute a loss as loss = model_output - detached_target, and then call loss.backward(), will the model's parameters be updated? Why or why not?
Show answer hint
Yes, the model's parameters will be updated. The detached tensor is only one operand in the subtraction. The <code>model_output</code> side of the graph is still connected and will receive gradients. The detached target simply doesn't contribute to any gradient: only the model output's computation graph is traced backward.
Variable(tensor, requires_grad=True), and .detach() did not exist: developers used .data instead. Since PyTorch 0.4.0 (April 2017), tensors and variables merged, and .detach() became the standard. Current PyTorch 2.11.x (March 2026) heavily discourages .data access; always use .detach().