Surrogate Loss is a loss function that is easier to optimize than the true objective we actually care about.

The true objective may be non-differentiable, expensive to evaluate, unstable to optimize directly, or only observable through samples. A surrogate loss gives the model a practical training signal that points in roughly the right direction.

Good surrogate losses

A useful surrogate loss should be:

  • differentiable or sub-differentiable
  • cheap enough to compute during training
  • correlated with the true objective
  • stable under optimization
  • hard to exploit in unintended ways

The last point matters because models optimize exactly what the loss rewards, not what we meant the loss to represent.

Failure mode

A surrogate can fail when improving the proxy no longer improves the real objective.

Examples:

  • lower cross entropy does not always mean better calibrated or more useful predictions
  • better imitation loss does not always mean better task success
  • higher reward model score does not always mean better human preference
  • larger policy update may improve the surrogate but damage actual rollout performance

This is called optimizing the proxy instead of the real goal.