Concept beginner · 3 min read

What is gradient descent in machine learning

Quick answer
Gradient descent is an optimization algorithm used in machine learning to minimize a loss function by iteratively adjusting model parameters in the direction of the steepest descent. It updates parameters using the gradient of the loss, helping models learn from data efficiently.
Gradient descent is an optimization algorithm that iteratively adjusts model parameters to minimize a loss function in machine learning.

How it works

Gradient descent works by calculating the gradient (partial derivatives) of the loss function with respect to model parameters, then moving those parameters in the opposite direction of the gradient to reduce the loss. Imagine rolling a ball down a hill: the ball naturally moves downhill to the lowest point, similar to how parameters update to minimize error.

Concrete example

Here is a simple Python example of gradient descent minimizing the function f(x) = (x - 3)^2. The goal is to find x that minimizes f(x).

python
import os

def f(x):
    return (x - 3) ** 2

def grad_f(x):
    return 2 * (x - 3)

x = 0  # initial guess
learning_rate = 0.1
iterations = 20

for i in range(iterations):
    grad = grad_f(x)
    x = x - learning_rate * grad
    print(f"Iteration {i+1}: x = {x:.4f}, f(x) = {f(x):.4f}")
output
Iteration 1: x = 0.6000, f(x) = 5.7600
Iteration 2: x = 1.0800, f(x) = 3.6864
Iteration 3: x = 1.4640, f(x) = 2.3593
Iteration 4: x = 1.7712, f(x) = 1.5099
Iteration 5: x = 2.0170, f(x) = 0.9663
Iteration 6: x = 2.2136, f(x) = 0.6185
Iteration 7: x = 2.3709, f(x) = 0.3960
Iteration 8: x = 2.4967, f(x) = 0.2534
Iteration 9: x = 2.5973, f(x) = 0.1622
Iteration 10: x = 2.6778, f(x) = 0.1038
Iteration 11: x = 2.7422, f(x) = 0.0664
Iteration 12: x = 2.7937, f(x) = 0.0425
Iteration 13: x = 2.8350, f(x) = 0.0272
Iteration 14: x = 2.8680, f(x) = 0.0174
Iteration 15: x = 2.8944, f(x) = 0.0111
Iteration 16: x = 2.9155, f(x) = 0.0071
Iteration 17: x = 2.9324, f(x) = 0.0045
Iteration 18: x = 2.9459, f(x) = 0.0029
Iteration 19: x = 2.9567, f(x) = 0.0019
Iteration 20: x = 2.9653, f(x) = 0.0012

When to use it

Use gradient descent when training machine learning models that require minimizing a differentiable loss function, such as linear regression, logistic regression, and neural networks. Avoid it when the loss function is non-differentiable or when closed-form solutions exist, as gradient descent can be slower or less precise in those cases.

Key terms

TermDefinition
GradientVector of partial derivatives indicating the slope of the loss function.
Loss functionA function that measures the error between predicted and true values.
Learning rateA hyperparameter controlling the step size during parameter updates.
ParametersModel variables adjusted to minimize the loss function.

Key Takeaways

  • Gradient descent iteratively updates model parameters to minimize error by moving opposite the gradient.
  • Choosing an appropriate learning rate is critical for convergence speed and stability.
  • Gradient descent is fundamental for training differentiable machine learning models efficiently.
Verified 2026-04
Verify ↗