Mean squared error loss (MSE) measures the average squared difference between a model’s predictions and the true target values.

  • = predicted value
  • = true target value
  • = number of examples

Intuition

MSE asks: “On average, how far are the predictions from the true values, after squaring the errors?”

Squaring the error has two important effects:

  • Negative and positive errors do not cancel out.
  • Large errors are penalized more heavily than small errors.

For example, an error of contributes to the loss, while an error of contributes only .

Gradient

For one prediction:

For the averaged loss over examples:

This means the update signal grows linearly with the prediction error.

When to Use

  • Regression problems, where the target is a continuous value.
  • Continuous-action behavior cloning.
  • Models where large mistakes should be punished strongly.

Characteristics

  • Smooth and differentiable, so it works well with Backpropagation.
  • Sensitive to outliers because errors are squared.
  • Has the same optimum as mean absolute error for many simple settings, but gives much larger gradients for large mistakes.
  • Commonly used as a cost function for regression.

Compared to Cross Entropy

MSE is usually used for continuous numeric targets, while Cross Entropy Loss is usually used for classification with probability outputs.