How to write training loop in PyTorch
Quick answer
A training loop in
PyTorch involves iterating over batches of data, forwarding inputs through the model, computing loss with a criterion, backpropagating gradients using loss.backward(), and updating model weights with an optimizer's step(). This loop runs for multiple epochs to train the model.PREREQUISITES
Python 3.8+pip install torch>=2.0
Setup
Install PyTorch if not already installed. Use the following command to install the latest stable version:
pip install torch torchvision Step by step
This example shows a complete training loop for a simple neural network on dummy data using PyTorch. It covers model definition, data loading, loss calculation, backpropagation, and optimizer step.
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
# Define a simple model
class SimpleNet(nn.Module):
def __init__(self):
super(SimpleNet, self).__init__()
self.fc = nn.Linear(10, 1)
def forward(self, x):
return self.fc(x)
# Create dummy dataset
x = torch.randn(100, 10) # 100 samples, 10 features
y = torch.randn(100, 1) # 100 targets
dataset = TensorDataset(x, y)
dataloader = DataLoader(dataset, batch_size=16, shuffle=True)
# Initialize model, loss, optimizer
model = SimpleNet()
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
epochs = 5
for epoch in range(epochs):
running_loss = 0.0
for inputs, targets in dataloader:
optimizer.zero_grad() # Zero gradients
outputs = model(inputs) # Forward pass
loss = criterion(outputs, targets) # Compute loss
loss.backward() # Backpropagation
optimizer.step() # Update weights
running_loss += loss.item() * inputs.size(0)
epoch_loss = running_loss / len(dataloader.dataset)
print(f"Epoch {epoch+1}/{epochs}, Loss: {epoch_loss:.4f}") output
Epoch 1/5, Loss: 1.0123 Epoch 2/5, Loss: 0.9876 Epoch 3/5, Loss: 0.9654 Epoch 4/5, Loss: 0.9452 Epoch 5/5, Loss: 0.9267
Common variations
- Use
torch.cudato move model and data to GPU for faster training. - Replace
optim.SGDwithoptim.Adamfor adaptive learning rates. - Implement validation loop alongside training to monitor overfitting.
- Use learning rate schedulers to adjust learning rate during training.
Troubleshooting
- If you get
RuntimeError: CUDA out of memory, reduce batch size or move model/data to CPU. - If loss does not decrease, check data normalization and learning rate.
- Ensure
optimizer.zero_grad()is called beforeloss.backward()to avoid gradient accumulation.
Key Takeaways
- Always zero gradients before backpropagation with
optimizer.zero_grad(). - Use
loss.backward()to compute gradients andoptimizer.step()to update weights. - Wrap your data in a
DataLoaderfor efficient batch processing. - Move model and data to GPU with
model.to(device)andinputs.to(device)for faster training. - Monitor training loss each epoch to ensure your model is learning.