RMS Norm

RMS Norm is a normalization method commonly used in Transformer models.

It rescales an activation vector by its root mean square value.

The denominator is the root mean square:

Intuition

RMS Norm controls the scale of the activation vector without changing its mean.

Instead of asking “how far is each value from the mean?”, it asks “how large is this vector on average?”

This makes the operation simpler than Layer Normalization because it only normalizes by magnitude.

RMS Norm removes the mean-centering step and usually removes the learned bias term: