Code Beginner easy · 4 min

model.parameters(): accessing weights

What you will learn

<code>model.parameters()</code> returns an iterator over all learnable weights in your neural network that the optimizer will update during training.

Why this matters

You need to access model weights to pass them to an optimizer, inspect training progress, save/load models, or freeze specific layers during fine-tuning. This is the fundamental bridge between your model and the training loop.

Skip if: When you only need to make predictions, you don't call <code>model.parameters()</code>: just call <code>model(input_tensor)</code> directly. You also skip it if you're using a high-level framework that abstracts parameter access (though you should understand it anyway).

Explanation

What it is: model.parameters() is a generator method on any nn.Module that yields all trainable tensors (weights and biases) in your network. It traverses the module tree recursively, so it finds parameters in nested layers automatically.

How it works mechanically: When you call model.parameters(), PyTorch walks through every submodule you registered (via self.layer = nn.Linear(...)) and yields their weight and bias tensors. Each tensor is a requires_grad=True by default, meaning gradients will be computed for it during backprop. You typically convert it to a list or iterate over it to pass to torch.optim.SGD(model.parameters(), lr=0.01).

When to use it: Always pass model.parameters() to your optimizer. Use it to inspect weight magnitudes during debugging, or to selectively freeze parameters with param.requires_grad = False before passing only unfrozen ones to the optimizer.

Analogy

Think of <code>model.parameters()</code> like asking a company for a list of all employee paychecks. The company (your model) knows about everyone on payroll because employees are properly registered in HR (submodules). You don't have to hunt through filing cabinets yourself: the company gives you the complete list automatically.

Code

python

import torch
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 5)
        self.fc2 = nn.Linear(5, 2)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        return x

model = SimpleNet()

print("All parameters:")
for name, param in model.named_parameters():
    print(f"{name}: shape {param.shape}, requires_grad={param.requires_grad}")

print("\nTotal parameter count:")
total_params = sum(p.numel() for p in model.parameters())
print(f"Total: {total_params}")

print("\nPassing to optimizer:")
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
print(f"Optimizer created with {len(list(model.parameters()))} parameter groups")

print("\nFreezing first layer:")
for param in model.fc1.parameters():
    param.requires_grad = False

print("\nAfter freezing fc1:")
for name, param in model.named_parameters():
    print(f"{name}: requires_grad={param.requires_grad}")

Output

All parameters:
fc1.weight: shape torch.Size([5, 10]), requires_grad=True
fc1.bias: shape torch.Size([5]), requires_grad=True
fc2.weight: shape torch.Size([2, 5]), requires_grad=True
fc2.bias: shape torch.Size([2]), requires_grad=True

Total parameter count:
Total: 67

Passing to optimizer:
Optimizer created with 4 parameter groups

Freezing first layer:

After freezing fc1:
fc1.weight: requires_grad=False
fc1.bias: requires_grad=False
fc2.weight: requires_grad=True
fc2.bias: requires_grad=True

What just happened?

We created a two-layer network, then used <code>model.named_parameters()</code> to inspect all weights and biases (4 total tensors). We counted total parameters (67 = 5×10 + 5 + 2×5 + 2). We created an SGD optimizer pointing to all parameters. Finally, we manually set <code>requires_grad=False</code> on the first layer's parameters, so gradients won't be computed for them: useful for transfer learning where you freeze early layers.

Common gotcha

Developers often forget that model.parameters() returns a generator, not a list. If you iterate over it once, you can't iterate again: it's exhausted. Always convert to a list if you need multiple passes: params_list = list(model.parameters()). Also, parameters added after instantiation won't show up: you must register them as submodules in __init__, not as plain Python attributes.

Error recovery

RuntimeError: param should be a Tensor, not None

You passed <code>None</code> to the optimizer instead of <code>model.parameters()</code>. Check that your model is initialized before creating the optimizer. Correct: <code>optim.SGD(model.parameters(), lr=0.01)</code>.

TypeError: 'generator' object is not subscriptable

You tried to index the generator directly with <code>list(model.parameters())[0]</code> but forgot the <code>list()</code> conversion. Generators don't support indexing. Convert first: <code>params = list(model.parameters()); param_zero = params[0]</code>.

Loss does not decrease during training

You may have forgotten to pass <code>model.parameters()</code> to the optimizer, so the optimizer has no parameters to update. Verify with <code>print(optimizer.param_groups[0]['params'])</code>: should be non-empty.

Experienced dev note

In practice, you'll use model.named_parameters() more than model.parameters() because it's easier to debug (you see layer names). Also, 90% of the time you just write torch.optim.Adam(model.parameters(), lr=1e-3) and move on: but understanding what's inside that generator is critical when you need to freeze layers for fine-tuning, implement custom optimizers, or save/load weights selectively. One hidden trap: if you create a custom nn.Module and store a tensor as a plain attribute (self.my_weight = torch.randn(...)), it won't show up in parameters(): you must use nn.Parameter() or register it via self.register_parameter().

Check your understanding

You freeze the weights of your ResNet backbone and fine-tune only the classification head on a new dataset. Write pseudocode showing how you'd (1) freeze backbone parameters, (2) create an optimizer that only updates the head, and (3) verify the optimizer only has head parameters. What would optimizer.param_groups[0]['params'] contain, and why?

Show answer hint

A correct answer identifies that you iterate <code>model.backbone.parameters()</code> and set <code>requires_grad=False</code>, then pass only <code>model.head.parameters()</code> (or use a condition) to the optimizer. The optimizer's param_groups would contain only the head's weight and bias tensors because that's all you passed to it.

VERSION In PyTorch < 2.0, using .data attribute on parameters was common (param.data.zero_()). Modern code uses param.detach() or direct operations. No breaking change in parameter access itself between 2.6.x and 2.11.x, but the idiom shifted toward functional APIs.

Next, learn about <code>model.state_dict()</code> to save and load all these parameters together, so you don't lose training progress.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.