model.parameters(): accessing weights
Why this matters
You need to access model weights to pass them to an optimizer, inspect training progress, save/load models, or freeze specific layers during fine-tuning. This is the fundamental bridge between your model and the training loop.
Explanation
What it is: model.parameters() is a generator method on any nn.Module that yields all trainable tensors (weights and biases) in your network. It traverses the module tree recursively, so it finds parameters in nested layers automatically.
How it works mechanically: When you call model.parameters(), PyTorch walks through every submodule you registered (via self.layer = nn.Linear(...)) and yields their weight and bias tensors. Each tensor is a requires_grad=True by default, meaning gradients will be computed for it during backprop. You typically convert it to a list or iterate over it to pass to torch.optim.SGD(model.parameters(), lr=0.01).
When to use it: Always pass model.parameters() to your optimizer. Use it to inspect weight magnitudes during debugging, or to selectively freeze parameters with param.requires_grad = False before passing only unfrozen ones to the optimizer.
Analogy
Think of <code>model.parameters()</code> like asking a company for a list of all employee paychecks. The company (your model) knows about everyone on payroll because employees are properly registered in HR (submodules). You don't have to hunt through filing cabinets yourself: the company gives you the complete list automatically.
Code
import torch
import torch.nn as nn
class SimpleNet(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(10, 5)
self.fc2 = nn.Linear(5, 2)
def forward(self, x):
x = self.fc1(x)
x = self.fc2(x)
return x
model = SimpleNet()
print("All parameters:")
for name, param in model.named_parameters():
print(f"{name}: shape {param.shape}, requires_grad={param.requires_grad}")
print("\nTotal parameter count:")
total_params = sum(p.numel() for p in model.parameters())
print(f"Total: {total_params}")
print("\nPassing to optimizer:")
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
print(f"Optimizer created with {len(list(model.parameters()))} parameter groups")
print("\nFreezing first layer:")
for param in model.fc1.parameters():
param.requires_grad = False
print("\nAfter freezing fc1:")
for name, param in model.named_parameters():
print(f"{name}: requires_grad={param.requires_grad}") All parameters: fc1.weight: shape torch.Size([5, 10]), requires_grad=True fc1.bias: shape torch.Size([5]), requires_grad=True fc2.weight: shape torch.Size([2, 5]), requires_grad=True fc2.bias: shape torch.Size([2]), requires_grad=True Total parameter count: Total: 67 Passing to optimizer: Optimizer created with 4 parameter groups Freezing first layer: After freezing fc1: fc1.weight: requires_grad=False fc1.bias: requires_grad=False fc2.weight: requires_grad=True fc2.bias: requires_grad=True
What just happened?
We created a two-layer network, then used <code>model.named_parameters()</code> to inspect all weights and biases (4 total tensors). We counted total parameters (67 = 5×10 + 5 + 2×5 + 2). We created an SGD optimizer pointing to all parameters. Finally, we manually set <code>requires_grad=False</code> on the first layer's parameters, so gradients won't be computed for them: useful for transfer learning where you freeze early layers.
Common gotcha
Developers often forget that model.parameters() returns a generator, not a list. If you iterate over it once, you can't iterate again: it's exhausted. Always convert to a list if you need multiple passes: params_list = list(model.parameters()). Also, parameters added after instantiation won't show up: you must register them as submodules in __init__, not as plain Python attributes.
Error recovery
RuntimeError: param should be a Tensor, not NoneTypeError: 'generator' object is not subscriptableLoss does not decrease during trainingExperienced dev note
In practice, you'll use model.named_parameters() more than model.parameters() because it's easier to debug (you see layer names). Also, 90% of the time you just write torch.optim.Adam(model.parameters(), lr=1e-3) and move on: but understanding what's inside that generator is critical when you need to freeze layers for fine-tuning, implement custom optimizers, or save/load weights selectively. One hidden trap: if you create a custom nn.Module and store a tensor as a plain attribute (self.my_weight = torch.randn(...)), it won't show up in parameters(): you must use nn.Parameter() or register it via self.register_parameter().
Check your understanding
You freeze the weights of your ResNet backbone and fine-tune only the classification head on a new dataset. Write pseudocode showing how you'd (1) freeze backbone parameters, (2) create an optimizer that only updates the head, and (3) verify the optimizer only has head parameters. What would optimizer.param_groups[0]['params'] contain, and why?
Show answer hint
A correct answer identifies that you iterate <code>model.backbone.parameters()</code> and set <code>requires_grad=False</code>, then pass only <code>model.head.parameters()</code> (or use a condition) to the optimizer. The optimizer's param_groups would contain only the head's weight and bias tensors because that's all you passed to it.
.data attribute on parameters was common (param.data.zero_()). Modern code uses param.detach() or direct operations. No breaking change in parameter access itself between 2.6.x and 2.11.x, but the idiom shifted toward functional APIs.