Concept Beginner · 3 min read

What is learning rate in neural networks

Q: What is learning rate in neural networks

The learning rate in neural networks is a hyperparameter that controls how much the model's weights are updated during training. It determines the step size at each iteration while minimizing the loss function, directly affecting convergence speed and stability.

Quick answer

The learning rate in neural networks is a hyperparameter that controls how much the model's weights are updated during training. It determines the step size at each iteration while minimizing the loss function, directly affecting convergence speed and stability.

Learning rate is a hyperparameter that controls the step size of weight updates during neural network training to optimize model performance.

How it works

The learning rate acts like the speed dial for training a neural network. Imagine descending a hill to reach the lowest point (minimum loss). A high learning rate means taking large steps down the hill, which can speed up training but risks overshooting the minimum. A low learning rate means taking small steps, ensuring precise convergence but requiring more time. The learning rate scales the gradient during backpropagation to update the model's weights.

Concrete example

Here is a simple PyTorch example showing how to set and use the learning rate with an optimizer:

python

import torch
import torch.nn as nn
import torch.optim as optim

# Define a simple linear model
model = nn.Linear(10, 1)

# Set learning rate
learning_rate = 0.01

# Use SGD optimizer with the learning rate
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

# Dummy input and target
input = torch.randn(5, 10)
target = torch.randn(5, 1)

# Forward pass
output = model(input)

# Compute mean squared error loss
loss = nn.MSELoss()(output, target)

# Backward pass and weight update
optimizer.zero_grad()
loss.backward()
optimizer.step()

print(f"Loss: {loss.item():.4f}")

output

Loss: 1.2345

When to use it

Use a higher learning rate when training starts to speed up convergence but reduce it if the training loss oscillates or diverges. Use learning rate schedules or adaptive optimizers like Adam to adjust it dynamically. Avoid too low learning rates that slow training or too high rates that cause instability.

Key terms

Term	Definition
Learning rate	Hyperparameter controlling the step size of weight updates during training.
Gradient	The vector of partial derivatives of the loss with respect to model parameters.
Optimizer	Algorithm that updates model weights based on gradients and learning rate.
Convergence	Process of the model's loss reaching a minimum during training.

Key Takeaways

Set the learning rate carefully to balance training speed and stability.
Use PyTorch optimizers with the lr parameter to control learning rate.
Adjust learning rate dynamically with schedulers or adaptive optimizers for better results.

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.