Concept beginner · 3 min read

What is gradient descent in deep learning

Q: What is gradient descent in deep learning

Gradient descent is an optimization algorithm used in deep learning to minimize the loss function by iteratively updating model parameters in the direction of the negative gradient. It helps neural networks learn by adjusting weights to reduce prediction errors.

Quick answer

Gradient descent is an optimization algorithm used in deep learning to minimize the loss function by iteratively updating model parameters in the direction of the negative gradient. It helps neural networks learn by adjusting weights to reduce prediction errors.

Gradient descent is an optimization algorithm that iteratively updates model parameters to minimize the loss function in deep learning.

How it works

Gradient descent works by calculating the gradient (partial derivatives) of the loss function with respect to each model parameter. This gradient indicates the direction of steepest increase, so the algorithm updates parameters in the opposite direction to reduce the loss. Imagine descending a hill blindfolded by feeling the slope under your feet and stepping downhill to reach the lowest point.

Concrete example

This PyTorch example demonstrates gradient descent to fit a simple linear model y = wx + b by minimizing mean squared error loss.

python

import torch

# Sample data: y = 2x + 1
x = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
y = torch.tensor([[3.0], [5.0], [7.0], [9.0]])

# Initialize parameters w and b
w = torch.randn(1, requires_grad=True)
b = torch.randn(1, requires_grad=True)

learning_rate = 0.01

for epoch in range(1000):
    # Forward pass: compute predictions
    y_pred = x * w + b

    # Compute mean squared error loss
    loss = ((y_pred - y) ** 2).mean()

    # Backward pass: compute gradients
    loss.backward()

    # Update parameters using gradient descent
    with torch.no_grad():
        w -= learning_rate * w.grad
        b -= learning_rate * b.grad

        # Zero gradients for next iteration
        w.grad.zero_()
        b.grad.zero_()

# Print learned parameters
print(f"Learned weight: {w.item():.4f}")
print(f"Learned bias: {b.item():.4f}")

output

Learned weight: 2.0000
Learned bias: 1.0000

When to use it

Use gradient descent when training neural networks or other differentiable models to minimize a loss function. It is essential for supervised learning tasks like classification and regression. Avoid using it when the loss landscape is non-differentiable or when closed-form solutions exist, as gradient descent can be slow or ineffective in those cases.

✅

Key Takeaways

Gradient descent iteratively updates model parameters by moving against the gradient of the loss function.
It is the core optimization method for training deep learning models in frameworks like PyTorch.
Choosing the right learning rate is critical for convergence speed and stability.
Gradient descent requires the loss function to be differentiable with respect to model parameters.

Verified 2026-04

Verify ↗