Broadcasting rules: shape compatibility
Why this matters
Broadcasting is how PyTorch lets you add a scalar to a matrix or multiply tensors of different shapes without manual expansion: misunderstanding it causes silent shape mismatches or unexpected results in production models.
Explanation
Broadcasting is NumPy/PyTorch's automatic shape-alignment rule that lets operations work on tensors of different shapes by virtually expanding dimensions. The core rule: align shapes from the right, then expand dimensions of size 1 to match the larger tensor. For example, a tensor of shape (3, 1) can broadcast with shape (3, 5): the 1 gets expanded to 5 during the operation, but the tensor isn't actually modified in memory. Mechanically, PyTorch checks dimensions from right to left: if they're equal, compatible (one is 1), or one dimension is missing, the operation proceeds. If a dimension size is incompatible (e.g., 3 and 5 don't match and neither is 1), you get a RuntimeError. This matters because broadcasting silently succeeds in ways that surprise developers: you can accidentally broadcast a batch of 64 samples against a batch of 32 without error if one tensor has shape (1, 100) and the other (64, 100).
Analogy
Think of broadcasting like "fill in the blanks" with a photocopy machine. If you have a template (1, 5) and need it to match (3, 5), the machine automatically copies the single row 3 times. But if you ask for (1, 5) to match (3, 7), the machine jams because column 5 and column 7 don't align: it can't stretch or shrink.
Code
import torch
# Example 1: scalar broadcast with matrix
matrix = torch.tensor([[1, 2, 3], [4, 5, 6]])
scalar = torch.tensor(10)
result = matrix + scalar
print(f"Matrix shape: {matrix.shape}, Scalar shape: {scalar.shape}")
print(f"Result shape: {result.shape}")
print(f"Result:\n{result}")
print()
# Example 2: dimension-1 broadcast
vector = torch.tensor([[1], [2], [3]]) # shape (3, 1)
matrix = torch.tensor([[10, 20, 30], [40, 50, 60], [70, 80, 90]]) # shape (3, 3)
result = vector + matrix
print(f"Vector shape: {vector.shape}, Matrix shape: {matrix.shape}")
print(f"Result shape: {result.shape}")
print(f"Result:\n{result}")
print()
# Example 3: dimension missing (implicit expansion)
row = torch.tensor([[1, 2, 3]]) # shape (1, 3)
col = torch.tensor([[10], [20], [30]]) # shape (3, 1)
result = row + col # broadcasts to (3, 3)
print(f"Row shape: {row.shape}, Col shape: {col.shape}")
print(f"Result shape: {result.shape}")
print(f"Result:\n{result}")
print()
# Example 4: broadcasting fails
try:
a = torch.tensor([[1, 2], [3, 4]]) # shape (2, 2)
b = torch.tensor([[1, 2, 3]]) # shape (1, 3)
result = a + b
except RuntimeError as e:
print(f"RuntimeError: {e}") Matrix shape: torch.Size([2, 3]), Scalar shape: torch.Size([])
Result shape: torch.Size([2, 3])
Result:
tensor([[11, 12, 13],
[14, 15, 16]])
Vector shape: torch.Size([3, 1]), Matrix shape: torch.Size([3, 3])
Result shape: torch.Size([3, 3])
Result:
tensor([[ 11, 21, 31],
[ 42, 52, 62],
[ 73, 83, 93]])
Row shape: torch.Size([1, 3]), Col shape: torch.Size([3, 1])
Result shape: torch.Size([3, 3])
Result:
tensor([[11, 12, 13],
[21, 22, 23],
[31, 32, 33]])
RuntimeError: The size of tensor a (2) does not match the size of tensor b (3) at non-singleton dimension 1 What just happened?
The code demonstrated four broadcasting scenarios: (1) a scalar broadcasts across all elements of a matrix by treating it as shape (); (2) a column vector (3, 1) broadcasts across rows of a (3, 3) matrix by expanding the 1 to 3; (3) a row (1, 3) and column (3, 1) both expand to (3, 3) because each has a dimension of 1; (4) an attempt to add (2, 2) and (1, 3) fails at runtime because the final dimension is 2 versus 3, and neither is 1, so no valid broadcast exists.
Common gotcha
Developers assume broadcasting means 'make both shapes match exactly': but it only requires shapes to be compatible (equal or one is 1). This causes subtle bugs where a loss function broadcasts a (64,) loss across a (1, 64, 1) batch dimension when it should fail, silently computing the wrong gradient update.
Error recovery
RuntimeError: The size of tensor a (X) does not match the size of tensor b (Y) at non-singleton dimension NSilent wrong result (values correct magnitude but applied to wrong batch element)Experienced dev note
Beginners think broadcasting is a convenience: senior devs know it's a landmine. In production models, a single broadcast mistake can train on the wrong loss for weeks before validation catches it. Always be explicit: if you're relying on broadcasting, add a comment and a shape assertion. Tools like torch.broadcast_shapes(shape1, shape2) let you verify broadcast compatibility without computing, which is a cheap safety check in model initialization.
Check your understanding
You have a batch of logits with shape (64, 10) and a per-class weight tensor of shape (10,). Will they broadcast successfully when you multiply them? Why or why not? What is the resulting shape?
Show answer hint
The answer requires understanding that (10,) broadcasts by adding a dimension on the left (treated as (1, 10)), which then broadcasts to (64, 10). The key insight is that dimensions are aligned from the RIGHT, not left.