Arithmetic: add, sub, mul, div
Why this matters
All neural network computations are built on tensor arithmetic: understanding how operations work at this level is the foundation for building models, and knowing which operations track gradients is critical for training.
Explanation
Arithmetic operations in PyTorch perform element-wise math on tensors. The four basic operations: addition, subtraction, multiplication, and division: work intuitively and produce new tensors with the same shape as the inputs (when shapes are compatible). How it works: When you add two tensors, PyTorch matches shapes via broadcasting rules (smaller tensors expand to match larger ones), then applies the operation element-by-element. Unlike NumPy, every arithmetic operation on a PyTorch tensor preserves the computational graph: meaning gradients can flow backward through these operations during backpropagation. When to use: Use PyTorch arithmetic whenever tensors are part of a model or loss computation. Use NumPy for one-off data transformations that don't need differentiation.
Analogy
Tensor arithmetic is like doing the same math operation on every number in a spreadsheet simultaneously. Just as a spreadsheet broadcasts a formula across columns, PyTorch broadcasts smaller tensors to match larger ones before operating.
Code
import torch
# Create two simple tensors
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([10.0, 20.0, 30.0])
# Addition
result_add = a + b
print(f"Addition: {result_add}")
# Subtraction
result_sub = b - a
print(f"Subtraction: {result_sub}")
# Multiplication (element-wise)
result_mul = a * b
print(f"Multiplication: {result_mul}")
# Division (element-wise)
result_div = b / a
print(f"Division: {result_div}")
# Broadcasting example: scalar and tensor
scalar = torch.tensor(2.0)
result_broadcast = a * scalar
print(f"Broadcasting (scalar * tensor): {result_broadcast}")
# Verify gradients are tracked
a_with_grad = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
result = a_with_grad + torch.tensor([5.0, 5.0, 5.0])
print(f"\nGradient tracking enabled: {result.requires_grad}") Addition: tensor([ 11., 22., 33.]) Subtraction: tensor([ 9., 18., 27.]) Multiplication: tensor([ 10., 40., 90.]) Division: tensor([10., 10., 10.]) Broadcasting (scalar * tensor): tensor([2., 4., 6.]) Gradient tracking enabled: True
What just happened?
The code created two tensors and performed four arithmetic operations on them element-by-element, producing new tensors with matching shapes. Then it demonstrated broadcasting: a scalar tensor was multiplied with a 3-element tensor, and the scalar was automatically expanded to match the shape. Finally, it showed that arithmetic operations on tensors with `requires_grad=True` preserve the gradient tracking flag, meaning backpropagation can flow through these operations.
Common gotcha
Many developers assume tensor arithmetic creates a copy of data. It doesn't: operations create new tensor objects, but the underlying data is still on the same device (CPU or GPU). More critically, a common mistake is forgetting that a / b performs floating-point division, not integer division. If both tensors are integers and you expect integer division, use torch.div(a, b, rounding_mode='floor'). Without specifying the mode, you'll get a float result even if inputs are int64.
Error recovery
RuntimeError: broadcast size mismatchTypeError: unsupported operand type(s)RuntimeError: CUDA out of memoryExperienced dev note
In-place operations (like `a += b` or `a.add_(b)`) save memory but break the gradient graph if `a` requires gradients. Avoid in-place ops on tensors you'll backprop through. Also, tensor arithmetic is not the same as matrix multiplication: use `@` or `torch.matmul()` for that. A silent bug: if you add a Python scalar (int/float) to a tensor, it works, but gradients won't track through the scalar side. Keep everything as tensors during forward passes.
Check your understanding
You have two tensors: `x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)` and `y = torch.tensor([[10.0], [20.0], [30.0]])`. What happens when you compute `z = x + y`, and why? (Hint: what shapes do they have, and what shape is the result?)
Show answer hint
A correct answer explains broadcasting: x is shape (3,), y is shape (3, 1). During addition, x broadcasts to (3, 1) to match y, and the result is shape (3, 3). The answer must also note that z.requires_grad will be True because at least one input has requires_grad=True.