Code Beginner easy · 6 min

nn.Module: the base class for all networks

What you will learn

nn.Module is the container that holds all your model's learnable parameters and the forward pass logic that transforms inputs to outputs.

Why this matters

Every neural network you build in PyTorch inherits from nn.Module: understanding how it works is the foundation for building anything from simple classifiers to transformer models. Without this, you won't know how to structure code that PyTorch can actually train.

Skip if: You don't need nn.Module if you're only doing mathematical operations without learnable parameters (pure tensor math). You also don't need it for simple inference-only code that loads a pre-trained model and calls it once: though even then, using nn.Module is the cleaner pattern.

Explanation

What it is: nn.Module is PyTorch's base class for all neural network components. When you create a network, you inherit from it and define two things: what parameters your model has (in __init__) and how data flows through them (in forward()). How it works: When you call model(input), PyTorch automatically routes it to your forward() method. Parameters declared as nn.Parameter or inside other nn.Module objects are automatically tracked for gradient computation during backprop. The module also manages device placement (CPU/GPU), training/evaluation modes, and parameter initialization. When to use it: Use nn.Module for anything with learnable weights: linear layers, convolutions, embeddings, attention blocks, or any custom layer you want to train.

Analogy

Think of nn.Module like a recipe card. The <code>__init__</code> method lists your ingredients (parameters like weights and biases). The <code>forward()</code> method is the cooking instructions that say how to combine those ingredients. PyTorch's training loop is the chef who reads the card, executes the recipe, tastes the output (loss), and adjusts the ingredients for next time.

Code

python

import torch
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

model = SimpleNet(input_size=10, hidden_size=32, output_size=2)
input_tensor = torch.randn(4, 10)
output = model(input_tensor)

print(f"Input shape: {input_tensor.shape}")
print(f"Output shape: {output.shape}")
print(f"\nModel parameters:")
for name, param in model.named_parameters():
    print(f"  {name}: {param.shape}")
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters())}")

Output

Input shape: torch.Size([4, 10])
Output shape: torch.Size([4, 2])

Model parameters:
  fc1.weight: torch.Size([32, 10])
  fc1.bias: torch.Size([32])
  fc2.weight: torch.Size([2, 32])
  fc2.bias: torch.Size([2])

Total parameters: 738

What just happened?

We defined a custom network class that inherits from nn.Module with two linear layers and a ReLU activation. We instantiated it with specific dimensions, created a random batch of 4 samples with 10 features each, passed it through the model via the forward pass, and verified that the output shape matched what we expected (batch size 4, 2 output classes). PyTorch automatically tracked all the weight and bias parameters: we can iterate over them and see they exist in memory ready for gradient computation.

Common gotcha

The most common mistake is forgetting to call super().__init__() at the start of your __init__ method. Without it, PyTorch's internal bookkeeping breaks and your parameters won't be registered: they'll exist in your object but won't show up in model.parameters(), won't move to GPU with model.to('cuda'), and won't get updated during training. You'll get silent failures, not error messages.

Error recovery

AttributeError: 'YourNet' object has no attribute 'fc1'

You're trying to access self.fc1 but never assigned it in __init__. Also check that you didn't assign it outside __init__ (like in forward): that makes it a regular Python attribute, not a registered module parameter.

RuntimeError: Expected all tensors to be on the same device

You created parameters in __init__ (which default to CPU) but passed GPU tensors to forward. Call model.to('cuda') after creating the model, or ensure your input tensor is on the same device as your model with input_tensor.to(model.fc1.weight.device).

TypeError: forward() missing 1 required positional argument: 'x'

You called model.forward(x) directly instead of model(x). Always use the call syntax: it triggers __call__ which handles device management and mode switching automatically.

Experienced dev note

A subtle thing: when you inherit from nn.Module and define child modules (like self.fc1 = nn.Linear(...)), PyTorch uses Python's descriptor protocol and object inspection to auto-register them. This only works if you assign them as direct attributes in __init__. If you build a list like self.layers = [nn.Linear(...), nn.Linear(...)] and append to it, those parameters won't be tracked: use nn.ModuleList instead. Similarly for dictionaries, use nn.ModuleDict. This catches even experienced developers when they refactor code for flexibility.

Check your understanding

If you moved your model to GPU with model.to('cuda'), but accidentally created a new parameter inside the forward() method (like a learnable scale factor initialized fresh in forward), would that parameter be on GPU or CPU, and why would that break training?

Show answer hint

A correct answer recognizes that parameters created in forward() would be on CPU (default device) while your input and other parameters are on GPU, causing a device mismatch error. More importantly, it identifies that this breaks the principle that parameters must be created in __init__ so they're registered and tracked properly.

VERSION PyTorch 2.11.x (March 2026): no breaking changes to nn.Module itself since 1.0.0, but torch.compile() in 2.0+ can now compile entire nn.Module subclasses for speedup without any code changes. The pattern remains stable.

Next, learn about <code>nn.Sequential</code>: a simpler way to stack layers linearly when you don't need custom logic in forward().

Community Notes

No notes yetBe the first to share a version-specific fix or tip.