Code Beginner easy · 4 min

torch.no_grad(): disabling gradient tracking

What you will learn

torch.no_grad() tells PyTorch to stop computing gradients, making inference faster and saving memory.

Why this matters

During inference (evaluation, prediction), you don't need gradients: but PyTorch computes them by default. torch.no_grad() prevents this waste, speeding up your code by 30-50% and freeing GPU memory that gradients would occupy.

Skip if: During training, never use torch.no_grad() on your forward pass. Your model needs gradient tracking to compute loss and backpropagate. Only use it for validation loops, testing, or deployment.

Explanation

What it is: torch.no_grad() is a context manager (or decorator) that disables automatic differentiation. When active, PyTorch skips building the computational graph that tracks operations for gradient computation.

How it works mechanically: Every tensor operation in PyTorch has a requires_grad flag. Inside torch.no_grad(), this flag is temporarily set to False, and no operation creates gradient information. When you exit the context, gradient tracking resumes. This is why it's safe to use: it only affects code inside the block.

When to use it: Use torch.no_grad() during validation, testing, and inference. Also use it when you need to modify weights manually (e.g., custom weight decay) without triggering gradients.

Analogy

Think of gradient tracking as a security camera recording every move a tensor makes through your network. During training, you need that recording to replay and understand what caused the loss. During inference, you already know what the network does: you just want the answer. torch.no_grad() turns off the camera, making everything run faster.

Code

python

import torch
import torch.nn as nn

# Create a simple model
model = nn.Linear(10, 5)
input_tensor = torch.randn(2, 10, requires_grad=True)

# WITHOUT torch.no_grad() - tracks gradients
output_with_grad = model(input_tensor)
print(f"Output requires grad: {output_with_grad.requires_grad}")
print(f"Output shape: {output_with_grad.shape}")

# WITH torch.no_grad() - no gradient tracking
with torch.no_grad():
    output_no_grad = model(input_tensor)
    print(f"Output requires grad (inside no_grad): {output_no_grad.requires_grad}")
    print(f"Output shape: {output_no_grad.shape}")

print(f"Output requires grad (after no_grad): {output_no_grad.requires_grad}")

Output

Output requires grad: True
Output shape: torch.Size([2, 5])
Output requires grad (inside no_grad): False
Output shape: torch.Size([2, 5])
Output requires grad (after no_grad): False

What just happened?

We created a model and tensor with gradient tracking enabled. When we ran the model normally, the output had requires_grad=True. Inside the torch.no_grad() context, the same model forward pass produced output with requires_grad=False. After exiting the context, the output still has requires_grad=False because it was already computed without gradients.

Common gotcha

Many developers think torch.no_grad() only affects new operations, but it affects all operations inside the block, including operations on tensors that have requires_grad=True. If you create a tensor inside the no_grad block, it will have requires_grad=False even if you explicitly set requires_grad=True when creating it. Also, exiting the context doesn't retroactively enable gradients for tensors created inside: they stay frozen.

Error recovery

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

You tried to compute loss.backward() on output generated inside torch.no_grad(). Fix: remove the torch.no_grad() context during training, or only use it during validation/inference.

Unexpected low model accuracy or loss not decreasing

You wrapped your training forward pass in torch.no_grad() by mistake. Gradients are disabled, so backprop has nothing to work with. Fix: ensure torch.no_grad() is only around validation/test code, not training loops.

Experienced dev note

In PyTorch 2.11.x, torch.no_grad() is still the standard, but torch.inference_mode() (added in 1.9.0) is slightly faster for inference because it also disables version checking. For production inference pipelines, consider torch.inference_mode() instead: it gives you an extra 5-10% speedup with zero downside during pure forward passes. However, torch.no_grad() is safer if you're doing anything unusual (manual gradient computation, weight updates) because it still maintains autograd machinery, just disabled.

Check your understanding

If you have a validation loop that processes 1000 batches, and you wrap only the model forward pass in torch.no_grad() but not the loss computation, will your GPU memory usage increase compared to wrapping both forward and loss together? Explain why or why not.

Show answer hint

Think about what happens after loss is computed: does the loss tensor need gradients? What about intermediate activations from the model?

VERSION torch.no_grad() has been stable since PyTorch 0.3.0 (2017). No breaking changes in 2.11.x. However, torch.inference_mode() (added in 1.9.0) is the modern alternative for pure inference and is faster.

Next, learn torch.enable_grad() and @torch.inference_mode to fine-tune when you want gradients tracked, and explore mixed precision training with torch.autocast() which works alongside gradient tracking.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.