The gradient of a scalar-valued function is the vector of its partial derivatives.

If

is a function from to , then its gradient is written as

and is defined by

So the gradient packages all the first-order rates of change of a function into one vector.

In 2 variables

If

then

where:

  • is the partial derivative with respect to
  • is the partial derivative with respect to

This is why the gradient is closely connected to Partial Derivatives.

Main idea

The gradient tells you two important things at a point:

  • the direction of steepest increase
  • how quickly the function increases in that direction

So if you want to increase the value of a function as fast as possible, you move in the direction of the gradient.

If you want to decrease the value as fast as possible, you move in the direction of

This is the idea behind Gradient Descent.

Example

Let

Then

so

At the point ,

This means that near , the function increases most rapidly in the direction of the vector

Geometric meaning

For a function , the graph is a surface in 3D.

But it is often easier to think in terms of level curves:

The gradient at a point is perpendicular to the level curve through that point.

So:

  • level curves show where the function stays constant
  • the gradient points straight across them in the direction where the function rises the fastest

This is one of the most important geometric interpretations of the gradient.

Connection to the directional derivative

If is a unit vector, then the Directional Derivative of in the direction is

This formula is extremely important.

It says that the rate of change of in a chosen direction is the dot product of:

  • the gradient
  • the direction vector

This immediately explains why the gradient gives the direction of greatest increase:

  • the dot product is largest when points in the same direction as
  • the dot product is smallest when points in the opposite direction

Magnitude of the gradient

The magnitude

is the maximum possible directional derivative at that point.

So the gradient gives both:

  • a direction
  • a maximum rate of increase

Critical points

If

then the point is called a critical point.

Critical points are important because they are candidates for:

  • local maxima
  • local minima
  • saddle points

So the gradient being zero means there is no first-order change in any direction.

Intuition

The best mental model is:

The gradient is the multivariable version of the derivative.

For a one-variable function, the derivative tells you the slope.

For a multivariable function, the gradient tells you:

  • which direction is uphill
  • how steep the hill is

So the gradient is not just a number anymore. It has to be a vector, because in multiple dimensions there are many possible directions to move.