Adversarial training is a method for making a model more robust by training it not just on normal examples, but also on intentionally perturbed inputs designed to fool it.
These perturbed inputs are called adversarial examples.
Core idea
Instead of only minimizing the loss on clean training data, adversarial training also tries to minimize the loss on the “worst” small perturbations of that data.
So during training:
- start with a normal input
- generate a modified version that makes the model more likely to fail
- train the model on both the clean and adversarial versions
This makes the model learn decision boundaries that are less fragile.
Why it works
Many machine learning models, especially Neural Networks, can be sensitive to tiny input changes that are almost invisible to humans but still cause wrong predictions.
Adversarial training exposes the model to these difficult cases during learning, so the model becomes harder to attack.
In that sense, adversarial training is similar to a targeted form of data augmentation, except the added examples are chosen specifically to be difficult for the current model.
Optimization view
Ordinary training tries to reduce prediction error on the training set.
Adversarial training is closer to a min-max problem:
- an inner step searches for a perturbation that increases the loss
- an outer step updates the model parameters to reduce that worst-case loss
The outer update is still usually done with methods based on Gradient Descent.
Tradeoffs
Adversarial training can:
- improve robustness to adversarial attacks
- sometimes improve stability under small input shifts
- significantly increase training cost because adversarial examples must be generated during training
- sometimes reduce clean-data accuracy if robustness is pushed too aggressively
Important distinction
Adversarial training usually refers to robustness against adversarial examples.
This is different from training one model against another in the style of a generative adversarial network, where two models are explicitly optimized against each other.