Mean squared error loss (MSE) measures the average squared difference between a model’s predictions and the true target values.
= predicted value = true target value = number of examples
Intuition
MSE asks: “On average, how far are the predictions from the true values, after squaring the errors?”
Squaring the error has two important effects:
- Negative and positive errors do not cancel out.
- Large errors are penalized more heavily than small errors.
For example, an error of
Gradient
For one prediction:
For the averaged loss over
This means the update signal grows linearly with the prediction error.
When to Use
- Regression problems, where the target is a continuous value.
- Continuous-action behavior cloning.
- Models where large mistakes should be punished strongly.
Characteristics
- Smooth and differentiable, so it works well with Backpropagation.
- Sensitive to outliers because errors are squared.
- Has the same optimum as mean absolute error for many simple settings, but gives much larger gradients for large mistakes.
- Commonly used as a cost function for regression.
Compared to Cross Entropy
MSE is usually used for continuous numeric targets, while Cross Entropy Loss is usually used for classification with probability outputs.