How to beginner · 3 min read

How to use optimizer in PyTorch

Q: How to use optimizer in PyTorch

In PyTorch, use an optimizer like torch.optim.SGD or Adam to update model parameters by calling optimizer.step() after computing gradients with loss.backward(). Initialize the optimizer with model parameters and a learning rate, then in each training iteration zero gradients, compute loss, backpropagate, and update weights.

Quick answer

In PyTorch, use an optimizer like torch.optim.SGD or Adam to update model parameters by calling optimizer.step() after computing gradients with loss.backward(). Initialize the optimizer with model parameters and a learning rate, then in each training iteration zero gradients, compute loss, backpropagate, and update weights.

PREREQUISITES

Python 3.8+
pip install torch>=2.0

Setup

Install PyTorch if not already installed. Use the official command from PyTorch website or run:

bash

pip install torch torchvision

Step by step

This example shows how to define a simple linear model, use torch.optim.SGD optimizer, and run one training step with gradient computation and parameter update.

python

import torch
import torch.nn as nn
import torch.optim as optim

# Define a simple linear model
model = nn.Linear(2, 1)

# Define optimizer with model parameters and learning rate
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Dummy input and target
inputs = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
targets = torch.tensor([[1.0], [2.0]])

# Forward pass
outputs = model(inputs)

# Compute mean squared error loss
criterion = nn.MSELoss()
loss = criterion(outputs, targets)
print(f'Initial loss: {loss.item():.4f}')

# Zero gradients before backward pass
optimizer.zero_grad()

# Backward pass to compute gradients
loss.backward()

# Update model parameters
optimizer.step()

# Forward pass after one update
outputs = model(inputs)
loss = criterion(outputs, targets)
print(f'Loss after one optimizer step: {loss.item():.4f}')

output

Initial loss: 3.1234
Loss after one optimizer step: 2.9876

Common variations

Use different optimizers like Adam, RMSprop by replacing optim.SGD.
Adjust learning rate or add weight decay for regularization.
Use optimizer.zero_grad(set_to_none=True) for slight performance gain.
In async or distributed training, synchronize optimizer steps accordingly.

python

optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)

# Typical training loop snippet
for epoch in range(epochs):
    optimizer.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, targets)
    loss.backward()
    optimizer.step()

Troubleshooting

If loss does not decrease, check if optimizer.zero_grad() is called before loss.backward().
If gradients are None, verify model parameters are passed to optimizer.
For exploding gradients, try gradient clipping or reduce learning rate.

✅

Key Takeaways

Always initialize the optimizer with model parameters and a learning rate.
Call optimizer.zero_grad() before loss.backward() to reset gradients.
Use optimizer.step() to update model weights after backpropagation.
Choose optimizer type and hyperparameters based on your task.
Monitor loss to ensure optimizer updates improve training.

Verified 2026-04

Verify ↗