Machine learningTraining techniques

Adversarial Training

Adversarial Training (Robust Optimization for DL) · Also known as: Min-Max Robust Training, PGD Adversarial Training, Robust Empirical Risk Minimization, Hasımsal Eğitim

Adversarial Training is a robust optimization procedure for deep neural networks in which the model is trained not on clean data alone but on worst-case perturbed inputs crafted during training. Formalized by Madry et al. (2018) as a min-max saddle-point problem, the method uses Projected Gradient Descent (PGD) to generate strong adversarial examples within a bounded Lp perturbation set before each gradient update, forcing the network to learn decision boundaries that are stable under such perturbations.

Tools & resources

Download slides

Learn & explore

Read the full method

Members only

Method map

The neighbourhood of related methods — select a node to explore.

Adversarial Training

Data Augmentation Generative Adversarial N…Out-of-Distribution Dete…

When to use it

Use adversarial training when deploying models in security-sensitive environments where inputs may be deliberately manipulated, such as malware detection, facial recognition under physical perturbations, or autonomous driving perception. It assumes a white-box threat model and a fixed perturbation budget epsilon. The method is unsuitable when computational resources are severely constrained—PGD-AT requires roughly 7-20x more computation per epoch than standard training. Consider certified defenses or randomized smoothing as alternatives when formal guarantees are needed.

Strengths & limitations

Strengths

Provides strong empirical robustness against a wide class of gradient-based attacks under the specified threat model
Directly optimizes for worst-case performance, making the defense grounded in a principled min-max formulation
Compatible with any differentiable architecture and loss function, requiring no changes to model structure
Serves as the de facto baseline for evaluating robustness benchmarks in the adversarial ML literature

Limitations

Significantly increases training cost due to multi-step inner PGD optimization at every gradient update
Often reduces clean accuracy compared to standard training, reflecting a robustness-accuracy trade-off
Robustness is typically limited to the threat model used during training; transferability across norms (L∞ vs L2) is imperfect
Does not provide formal certified guarantees; a sufficiently strong or unbounded attacker can still succeed

Frequently asked

What is the difference between FGSM training and PGD adversarial training?

FGSM (Fast Gradient Sign Method) training uses a single gradient step to generate adversarial examples, which is fast but weak. Madry et al. showed that FGSM training leads to gradient masking—an illusion of robustness that breaks against multi-step attacks. PGD training uses multiple iterative steps to find stronger adversarial examples, producing genuine and more transferable robustness at the cost of higher computation per update.

Does adversarial training guarantee robustness?

No. Adversarial training provides strong empirical robustness against attacks within the specified threat model but does not offer formal certified guarantees. A sufficiently powerful attacker operating outside the assumed budget or norm can still find adversarial inputs. Methods such as randomized smoothing or interval bound propagation are required when certified robustness is needed.

How should the perturbation budget epsilon be chosen?

Epsilon should reflect the realistic threat level in the deployment context. For CIFAR-10 with L∞, epsilon=8/255 is a common benchmark convention. Too small an epsilon yields negligible robustness; too large causes severe clean accuracy degradation and may make the training problem infeasible. The choice should be validated by measuring both clean accuracy and robustness at the selected budget.

Sources

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations (ICLR). link ↗

How to cite this page

ScholarGate. (2026, June 2). Adversarial Training (Robust Optimization for DL). ScholarGate. https://scholargate.app/en/deep-learning/adversarial-training

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Data AugmentationDeep learning↔ compare
Generative Adversarial NetworkDeep learning↔ compare
Out-of-Distribution DetectionMachine learning↔ compare

Compare side by side →

Referenced by

Data Augmentation

Related reference concepts

Backpropagation and Optimization Deep Generative Models Stochastic Optimization Hyperparameter Optimization Generalization Bounds Deep Learning

Spotted an issue on this page? Report or suggest a fix →

Machine learningTraining techniques

Adversarial Training

Adversarial Training (Robust Optimization for DL) · Also known as: Min-Max Robust Training, PGD Adversarial Training, Robust Empirical Risk Minimization, Hasımsal Eğitim

Tools & resources

Download slides

Learn & explore

Read the full method

Members only

Method map

The neighbourhood of related methods — select a node to explore.

Adversarial Training

Data Augmentation Generative Adversarial N…Out-of-Distribution Dete…

When to use it

Strengths & limitations

Strengths

Provides strong empirical robustness against a wide class of gradient-based attacks under the specified threat model
Directly optimizes for worst-case performance, making the defense grounded in a principled min-max formulation
Compatible with any differentiable architecture and loss function, requiring no changes to model structure
Serves as the de facto baseline for evaluating robustness benchmarks in the adversarial ML literature

Limitations

Significantly increases training cost due to multi-step inner PGD optimization at every gradient update
Often reduces clean accuracy compared to standard training, reflecting a robustness-accuracy trade-off
Robustness is typically limited to the threat model used during training; transferability across norms (L∞ vs L2) is imperfect
Does not provide formal certified guarantees; a sufficiently strong or unbounded attacker can still succeed

Frequently asked

What is the difference between FGSM training and PGD adversarial training?

Does adversarial training guarantee robustness?

How should the perturbation budget epsilon be chosen?

Sources

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations (ICLR). link ↗

How to cite this page

ScholarGate. (2026, June 2). Adversarial Training (Robust Optimization for DL). ScholarGate. https://scholargate.app/en/deep-learning/adversarial-training

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Data AugmentationDeep learning↔ compare
Generative Adversarial NetworkDeep learning↔ compare
Out-of-Distribution DetectionMachine learning↔ compare

Compare side by side →

Referenced by

Data Augmentation

Similar methods

Related reference concepts

Backpropagation and Optimization Deep Generative Models Stochastic Optimization Hyperparameter Optimization Generalization Bounds Deep Learning

Spotted an issue on this page? Report or suggest a fix →