Layer Normalization

Layer Normalization normalizes the features of an activation vector.

For an activation vector :

Then each component is normalized:

With learned scale and shift parameters:

Intuition

Layer Norm asks: “For this activation vector, how far is each value from the vector’s average?”

It recenters the vector around zero and rescales it to have a stable spread.

This helps prevent activations from becoming too large or too small during training.

After normalization, the vector has a fixed center and scale. That can be too restrictive.

and let the network learn the best scale and shift after normalization:

So Layer Norm stabilizes the activations, but still lets the model recover useful scales and offsets.