Gradients

The gradient of a scalar-valued function is the vector of its partial derivatives.

is a function from to , then its gradient is written as

and is defined by

So the gradient packages all the first-order rates of change of a function into one vector.

In 2 variables

then

where:

is the partial derivative with respect to
is the partial derivative with respect to

This is why the gradient is closely connected to Partial Derivatives.

Main idea

The gradient tells you two important things at a point:

the direction of steepest increase
how quickly the function increases in that direction

So if you want to increase the value of a function as fast as possible, you move in the direction of the gradient.

If you want to decrease the value as fast as possible, you move in the direction of

This is the idea behind Gradient Descent.

Example

Let

Then

At the point ,

This means that near , the function increases most rapidly in the direction of the vector

Geometric meaning

For a function , the graph is a surface in 3D.

But it is often easier to think in terms of level curves:

The gradient at a point is perpendicular to the level curve through that point.

So:

level curves show where the function stays constant
the gradient points straight across them in the direction where the function rises the fastest

This is one of the most important geometric interpretations of the gradient.

Connection to the directional derivative

If is a unit vector, then the Directional Derivative of in the direction is

This formula is extremely important.

It says that the rate of change of in a chosen direction is the dot product of:

the gradient
the direction vector

This immediately explains why the gradient gives the direction of greatest increase:

the dot product is largest when points in the same direction as
the dot product is smallest when points in the opposite direction

Magnitude of the gradient

The magnitude

is the maximum possible directional derivative at that point.

So the gradient gives both:

a direction
a maximum rate of increase

Critical points

then the point is called a critical point.

Critical points are important because they are candidates for:

local maxima
local minima
saddle points

So the gradient being zero means there is no first-order change in any direction.

Intuition

The best mental model is:

The gradient is the multivariable version of the derivative.

For a one-variable function, the derivative tells you the slope.

For a multivariable function, the gradient tells you:

which direction is uphill
how steep the hill is

So the gradient is not just a number anymore. It has to be a vector, because in multiple dimensions there are many possible directions to move.

Ayush Garg

Recently Updated

Learning Rate

Warm-up steps

8 Timeless Tips for Training LLMs | Julia Turc

Dropout

Gradients

In 2 variables

Main idea

Example

Geometric meaning

Connection to the directional derivative

Magnitude of the gradient

Critical points

Intuition

Graph View

Table of Contents

Backlinks

Ayush Garg

Recently Updated

Learning Rate

Warm-up steps

8 Timeless Tips for Training LLMs | Julia Turc

Dropout

Gradients

In 2 variables §

Main idea §

Example §

Geometric meaning §

Connection to the directional derivative §

Magnitude of the gradient §

Critical points §

Intuition §

Graph View

Table of Contents

Backlinks

In 2 variables

Main idea

Example

Geometric meaning

Connection to the directional derivative

Magnitude of the gradient

Critical points

Intuition