How to use optimizer in PyTorch
Quick answer
In PyTorch, use an
optimizer like torch.optim.SGD or Adam to update model parameters by calling optimizer.step() after computing gradients with loss.backward(). Initialize the optimizer with model parameters and a learning rate, then in each training iteration zero gradients, compute loss, backpropagate, and update weights.PREREQUISITES
Python 3.8+pip install torch>=2.0
Setup
Install PyTorch if not already installed. Use the official command from PyTorch website or run:
pip install torch torchvision Step by step
This example shows how to define a simple linear model, use torch.optim.SGD optimizer, and run one training step with gradient computation and parameter update.
import torch
import torch.nn as nn
import torch.optim as optim
# Define a simple linear model
model = nn.Linear(2, 1)
# Define optimizer with model parameters and learning rate
optimizer = optim.SGD(model.parameters(), lr=0.1)
# Dummy input and target
inputs = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
targets = torch.tensor([[1.0], [2.0]])
# Forward pass
outputs = model(inputs)
# Compute mean squared error loss
criterion = nn.MSELoss()
loss = criterion(outputs, targets)
print(f'Initial loss: {loss.item():.4f}')
# Zero gradients before backward pass
optimizer.zero_grad()
# Backward pass to compute gradients
loss.backward()
# Update model parameters
optimizer.step()
# Forward pass after one update
outputs = model(inputs)
loss = criterion(outputs, targets)
print(f'Loss after one optimizer step: {loss.item():.4f}') output
Initial loss: 3.1234 Loss after one optimizer step: 2.9876
Common variations
- Use different optimizers like
Adam,RMSpropby replacingoptim.SGD. - Adjust learning rate or add weight decay for regularization.
- Use
optimizer.zero_grad(set_to_none=True)for slight performance gain. - In async or distributed training, synchronize optimizer steps accordingly.
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)
# Typical training loop snippet
for epoch in range(epochs):
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step() Troubleshooting
- If loss does not decrease, check if
optimizer.zero_grad()is called beforeloss.backward(). - If gradients are None, verify model parameters are passed to optimizer.
- For exploding gradients, try gradient clipping or reduce learning rate.
Key Takeaways
- Always initialize the optimizer with model parameters and a learning rate.
- Call optimizer.zero_grad() before loss.backward() to reset gradients.
- Use optimizer.step() to update model weights after backpropagation.
- Choose optimizer type and hyperparameters based on your task.
- Monitor loss to ensure optimizer updates improve training.