Surrogate Loss is a loss function that is easier to optimize than the true objective we actually care about.
The true objective may be non-differentiable, expensive to evaluate, unstable to optimize directly, or only observable through samples. A surrogate loss gives the model a practical training signal that points in roughly the right direction.
Good surrogate losses
A useful surrogate loss should be:
- differentiable or sub-differentiable
- cheap enough to compute during training
- correlated with the true objective
- stable under optimization
- hard to exploit in unintended ways
The last point matters because models optimize exactly what the loss rewards, not what we meant the loss to represent.
Failure mode
A surrogate can fail when improving the proxy no longer improves the real objective.
Examples:
- lower cross entropy does not always mean better calibrated or more useful predictions
- better imitation loss does not always mean better task success
- higher reward model score does not always mean better human preference
- larger policy update may improve the surrogate but damage actual rollout performance
This is called optimizing the proxy instead of the real goal.