The gradient of a scalar-valued function is the vector of its partial derivatives.
If
is a function from
and is defined by
So the gradient packages all the first-order rates of change of a function into one vector.
In 2 variables
If
then
where:
is the partial derivative with respect to is the partial derivative with respect to
This is why the gradient is closely connected to Partial Derivatives.
Main idea
The gradient tells you two important things at a point:
- the direction of steepest increase
- how quickly the function increases in that direction
So if you want to increase the value of a function as fast as possible, you move in the direction of the gradient.
If you want to decrease the value as fast as possible, you move in the direction of
This is the idea behind Gradient Descent.
Example
Let
Then
so
At the point
This means that near
Geometric meaning
For a function
But it is often easier to think in terms of level curves:
The gradient at a point is perpendicular to the level curve through that point.
So:
- level curves show where the function stays constant
- the gradient points straight across them in the direction where the function rises the fastest
This is one of the most important geometric interpretations of the gradient.
Connection to the directional derivative
If
This formula is extremely important.
It says that the rate of change of
- the gradient
- the direction vector
This immediately explains why the gradient gives the direction of greatest increase:
- the dot product is largest when
points in the same direction as - the dot product is smallest when
points in the opposite direction
Magnitude of the gradient
The magnitude
is the maximum possible directional derivative at that point.
So the gradient gives both:
- a direction
- a maximum rate of increase
Critical points
If
then the point is called a critical point.
Critical points are important because they are candidates for:
- local maxima
- local minima
- saddle points
So the gradient being zero means there is no first-order change in any direction.
Intuition
The best mental model is:
The gradient is the multivariable version of the derivative.
For a one-variable function, the derivative tells you the slope.
For a multivariable function, the gradient tells you:
- which direction is uphill
- how steep the hill is
So the gradient is not just a number anymore. It has to be a vector, because in multiple dimensions there are many possible directions to move.