Adversarial Training

Adversarial training is a method for making a model more robust by training it not just on normal examples, but also on intentionally perturbed inputs designed to fool it.

These perturbed inputs are called adversarial examples.

Core idea

Instead of only minimizing the loss on clean training data, adversarial training also tries to minimize the loss on the “worst” small perturbations of that data.

So during training:

start with a normal input
generate a modified version that makes the model more likely to fail
train the model on both the clean and adversarial versions

This makes the model learn decision boundaries that are less fragile.

Why it works

Many machine learning models, especially Neural Networks, can be sensitive to tiny input changes that are almost invisible to humans but still cause wrong predictions.

Adversarial training exposes the model to these difficult cases during learning, so the model becomes harder to attack.

In that sense, adversarial training is similar to a targeted form of data augmentation, except the added examples are chosen specifically to be difficult for the current model.

Optimization view

Ordinary training tries to reduce prediction error on the training set.

Adversarial training is closer to a min-max problem:

an inner step searches for a perturbation that increases the loss
an outer step updates the model parameters to reduce that worst-case loss

The outer update is still usually done with methods based on Gradient Descent.

Tradeoffs

Adversarial training can:

improve robustness to adversarial attacks
sometimes improve stability under small input shifts
significantly increase training cost because adversarial examples must be generated during training
sometimes reduce clean-data accuracy if robustness is pushed too aggressively

Important distinction

Adversarial training usually refers to robustness against adversarial examples.

This is different from training one model against another in the style of a generative adversarial network, where two models are explicitly optimized against each other.

Ayush Garg

Recently Updated

Learning Rate

Warm-up steps

8 Timeless Tips for Training LLMs | Julia Turc

Dropout

Adversarial Training

Core idea

Why it works

Optimization view

Tradeoffs

Important distinction

Graph View

Table of Contents

Backlinks

Ayush Garg

Recently Updated

Learning Rate

Warm-up steps

8 Timeless Tips for Training LLMs | Julia Turc

Dropout

Adversarial Training

Core idea §

Why it works §

Optimization view §

Tradeoffs §

Important distinction §

Related ideas §

Graph View

Table of Contents

Backlinks

Core idea

Why it works

Optimization view

Tradeoffs

Important distinction

Related ideas