Mean Squared Error Loss

Mean squared error loss (MSE) measures the average squared difference between a model’s predictions and the true target values.

= predicted value
= true target value
= number of examples

Intuition

MSE asks: “On average, how far are the predictions from the true values, after squaring the errors?”

Squaring the error has two important effects:

Negative and positive errors do not cancel out.
Large errors are penalized more heavily than small errors.

For example, an error of contributes to the loss, while an error of contributes only .

Gradient

For one prediction:

For the averaged loss over examples:

This means the update signal grows linearly with the prediction error.

When to Use

Regression problems, where the target is a continuous value.
Continuous-action behavior cloning.
Models where large mistakes should be punished strongly.

Characteristics

Smooth and differentiable, so it works well with Backpropagation.
Sensitive to outliers because errors are squared.
Has the same optimum as mean absolute error for many simple settings, but gives much larger gradients for large mistakes.
Commonly used as a cost function for regression.

Compared to Cross Entropy

MSE is usually used for continuous numeric targets, while Cross Entropy Loss is usually used for classification with probability outputs.

Ayush Garg

Recently Updated

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Behavior Cloning Loss