Code Beginner easy · 5 min

Broadcasting rules: shape compatibility

What you will learn

Broadcasting automatically expands tensor shapes to make operations between different-sized tensors work without explicit reshaping.

Why this matters

Broadcasting is how PyTorch lets you add a scalar to a matrix or multiply tensors of different shapes without manual expansion: misunderstanding it causes silent shape mismatches or unexpected results in production models.

Skip if: When you explicitly want to enforce shape matching to catch bugs early: use <code>assert tensor1.shape == tensor2.shape</code> before operations if you're debugging shape issues in a model pipeline.

Explanation

Broadcasting is NumPy/PyTorch's automatic shape-alignment rule that lets operations work on tensors of different shapes by virtually expanding dimensions. The core rule: align shapes from the right, then expand dimensions of size 1 to match the larger tensor. For example, a tensor of shape (3, 1) can broadcast with shape (3, 5): the 1 gets expanded to 5 during the operation, but the tensor isn't actually modified in memory. Mechanically, PyTorch checks dimensions from right to left: if they're equal, compatible (one is 1), or one dimension is missing, the operation proceeds. If a dimension size is incompatible (e.g., 3 and 5 don't match and neither is 1), you get a RuntimeError. This matters because broadcasting silently succeeds in ways that surprise developers: you can accidentally broadcast a batch of 64 samples against a batch of 32 without error if one tensor has shape (1, 100) and the other (64, 100).

Analogy

Think of broadcasting like "fill in the blanks" with a photocopy machine. If you have a template (1, 5) and need it to match (3, 5), the machine automatically copies the single row 3 times. But if you ask for (1, 5) to match (3, 7), the machine jams because column 5 and column 7 don't align: it can't stretch or shrink.

Code

python

import torch

# Example 1: scalar broadcast with matrix
matrix = torch.tensor([[1, 2, 3], [4, 5, 6]])
scalar = torch.tensor(10)
result = matrix + scalar
print(f"Matrix shape: {matrix.shape}, Scalar shape: {scalar.shape}")
print(f"Result shape: {result.shape}")
print(f"Result:\n{result}")
print()

# Example 2: dimension-1 broadcast
vector = torch.tensor([[1], [2], [3]])  # shape (3, 1)
matrix = torch.tensor([[10, 20, 30], [40, 50, 60], [70, 80, 90]])  # shape (3, 3)
result = vector + matrix
print(f"Vector shape: {vector.shape}, Matrix shape: {matrix.shape}")
print(f"Result shape: {result.shape}")
print(f"Result:\n{result}")
print()

# Example 3: dimension missing (implicit expansion)
row = torch.tensor([[1, 2, 3]])  # shape (1, 3)
col = torch.tensor([[10], [20], [30]])  # shape (3, 1)
result = row + col  # broadcasts to (3, 3)
print(f"Row shape: {row.shape}, Col shape: {col.shape}")
print(f"Result shape: {result.shape}")
print(f"Result:\n{result}")
print()

# Example 4: broadcasting fails
try:
    a = torch.tensor([[1, 2], [3, 4]])  # shape (2, 2)
    b = torch.tensor([[1, 2, 3]])  # shape (1, 3)
    result = a + b
except RuntimeError as e:
    print(f"RuntimeError: {e}")

Output

Matrix shape: torch.Size([2, 3]), Scalar shape: torch.Size([])
Result shape: torch.Size([2, 3])
Result:
tensor([[11, 12, 13],
        [14, 15, 16]])

Vector shape: torch.Size([3, 1]), Matrix shape: torch.Size([3, 3])
Result shape: torch.Size([3, 3])
Result:
tensor([[ 11,  21,  31],
        [ 42,  52,  62],
        [ 73,  83,  93]])

Row shape: torch.Size([1, 3]), Col shape: torch.Size([3, 1])
Result shape: torch.Size([3, 3])
Result:
tensor([[11, 12, 13],
        [21, 22, 23],
        [31, 32, 33]])

RuntimeError: The size of tensor a (2) does not match the size of tensor b (3) at non-singleton dimension 1

What just happened?

The code demonstrated four broadcasting scenarios: (1) a scalar broadcasts across all elements of a matrix by treating it as shape (); (2) a column vector (3, 1) broadcasts across rows of a (3, 3) matrix by expanding the 1 to 3; (3) a row (1, 3) and column (3, 1) both expand to (3, 3) because each has a dimension of 1; (4) an attempt to add (2, 2) and (1, 3) fails at runtime because the final dimension is 2 versus 3, and neither is 1, so no valid broadcast exists.

Common gotcha

Developers assume broadcasting means 'make both shapes match exactly': but it only requires shapes to be compatible (equal or one is 1). This causes subtle bugs where a loss function broadcasts a (64,) loss across a (1, 64, 1) batch dimension when it should fail, silently computing the wrong gradient update.

Error recovery

RuntimeError: The size of tensor a (X) does not match the size of tensor b (Y) at non-singleton dimension N

Align shapes from the right: write out both shapes and check each dimension right-to-left. If a dimension mismatch isn't 1-to-anything, reshape one tensor explicitly with .view() or .unsqueeze() before the operation.

Silent wrong result (values correct magnitude but applied to wrong batch element)

Add print(tensor.shape) statements before and after every operation. Broadcasting is invisible, so shape checks are your only defense. Use torch.broadcast_tensors(a, b) to see the explicit result without computing.

Experienced dev note

Beginners think broadcasting is a convenience: senior devs know it's a landmine. In production models, a single broadcast mistake can train on the wrong loss for weeks before validation catches it. Always be explicit: if you're relying on broadcasting, add a comment and a shape assertion. Tools like torch.broadcast_shapes(shape1, shape2) let you verify broadcast compatibility without computing, which is a cheap safety check in model initialization.

Check your understanding

You have a batch of logits with shape (64, 10) and a per-class weight tensor of shape (10,). Will they broadcast successfully when you multiply them? Why or why not? What is the resulting shape?

Show answer hint

The answer requires understanding that (10,) broadcasts by adding a dimension on the left (treated as (1, 10)), which then broadcasts to (64, 10). The key insight is that dimensions are aligned from the RIGHT, not left.

VERSION Broadcasting rules are stable across PyTorch versions 1.0 through 2.11.x: no breaking changes. Behavior is consistent with NumPy broadcasting.

Next, learn how to explicitly reshape and squeeze/unsqueeze tensors to control dimensions when broadcasting doesn't give you the alignment you want.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.