What is gradient descent in deep learning
Gradient descent is an optimization algorithm used in deep learning to minimize the loss function by iteratively updating model parameters in the direction of the negative gradient. It helps neural networks learn by adjusting weights to reduce prediction errors.Gradient descent is an optimization algorithm that iteratively updates model parameters to minimize the loss function in deep learning.How it works
Gradient descent works by calculating the gradient (partial derivatives) of the loss function with respect to each model parameter. This gradient indicates the direction of steepest increase, so the algorithm updates parameters in the opposite direction to reduce the loss. Imagine descending a hill blindfolded by feeling the slope under your feet and stepping downhill to reach the lowest point.
Concrete example
This PyTorch example demonstrates gradient descent to fit a simple linear model y = wx + b by minimizing mean squared error loss.
import torch
# Sample data: y = 2x + 1
x = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
y = torch.tensor([[3.0], [5.0], [7.0], [9.0]])
# Initialize parameters w and b
w = torch.randn(1, requires_grad=True)
b = torch.randn(1, requires_grad=True)
learning_rate = 0.01
for epoch in range(1000):
# Forward pass: compute predictions
y_pred = x * w + b
# Compute mean squared error loss
loss = ((y_pred - y) ** 2).mean()
# Backward pass: compute gradients
loss.backward()
# Update parameters using gradient descent
with torch.no_grad():
w -= learning_rate * w.grad
b -= learning_rate * b.grad
# Zero gradients for next iteration
w.grad.zero_()
b.grad.zero_()
# Print learned parameters
print(f"Learned weight: {w.item():.4f}")
print(f"Learned bias: {b.item():.4f}") Learned weight: 2.0000 Learned bias: 1.0000
When to use it
Use gradient descent when training neural networks or other differentiable models to minimize a loss function. It is essential for supervised learning tasks like classification and regression. Avoid using it when the loss landscape is non-differentiable or when closed-form solutions exist, as gradient descent can be slow or ineffective in those cases.
Key Takeaways
-
Gradient descentiteratively updates model parameters by moving against the gradient of the loss function. - It is the core optimization method for training deep learning models in frameworks like
PyTorch. - Choosing the right learning rate is critical for convergence speed and stability.
- Gradient descent requires the loss function to be differentiable with respect to model parameters.