What question did this study set out to answer?

This research aims to systematically evaluate the effectiveness of various adversarial attack algorithms against convolutional neural networks.

June 15, 2026Open Access

Empirical Evaluation of Adversarial Attack Families on CNN Classifiers: FGSM, PGD, Carlini-Wagner, and DeepFool under ℓ∞ and ℓ₂ Perturbation Budgets

Key Points

This research aims to systematically evaluate the effectiveness of various adversarial attack algorithms against convolutional neural networks.
Empirical evaluation of FGSM, PGD, Carlini-Wagner, and DeepFool on MNIST dataset using PyTorch.
Assessment of model robustness under ℓ∞ and ℓ₂ perturbation budgets of ε ∈ {0.01, 0.03, 0.10}.
Quantitative evaluation of adversarial training's impact on robustness and accuracy.
At perturbation ε = 0.03, PGD reduced baseline CNN accuracy from 99% to 50%.
Adversarial training achieved 72% robust accuracy against PGD at ε = 0.03, sacrificing 6 percentage points of clean accuracy.
Demonstrated effectiveness of adversarial training as a defense mechanism.

Abstract

Empirical Evaluation of Adversarial Attack Families on CNN Classifiers Overview This preprint presents a systematic empirical evaluation of four canonical adversarial attack algorithms against convolutional neural network (CNN) classifiers: Fast Gradient Sign Method (FGSM) Projected Gradient Descent (PGD) Carlini-Wagner L₂ (C&W) DeepFool The study evaluates model robustness under both ℓ∞ and ℓ₂ threat models across multiple perturbation budgets and investigates the effectiveness of adversarial training as a defense mechanism. Abstract Standard evaluation of machine learning models measures accuracy on clean test data—a metric that collapses under adversarial perturbations. This paper presents a systematic empirical evaluation of four canonical adversarial attack algorithms—Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), Carlini-Wagner L₂ (C&W), and DeepFool—applied to convolutional neural network classifiers on the MNIST dataset. We evaluate robustness degradation across perturbation budgets ε ∈ 0. 01, 0. 03, 0. 10 under ℓ∞ and ℓ₂ threat models, quantifying the standard-versus-robust accuracy gap. We further evaluate adversarial training (Madry et al. , 2018) as a certified defense mechanism, measuring the robustness–accuracy tradeoff post-hardening across all attack families. Results show that PGD at ε = 0. 03 is the strongest white-box first-order attack, reducing baseline CNN accuracy from 99% to 50%. Adversarial training achieves 72% robust accuracy against PGD at ε = 0. 03 at a cost of 6 percentage points on clean accuracy. All implementations are developed from scratch in pure PyTorch without Foolbox or ART dependencies. Key Contributions From-scratch implementations of FGSM, PGD, Carlini-Wagner, and DeepFool attacks. Empirical robustness evaluation across multiple perturbation budgets. Quantitative assessment of adversarial training defenses. Reproducible experimental framework for adversarial robustness research. Open-source implementation designed for robustness auditing and educational use. Experimental Setting Dataset: MNIST Models: Custom CNN, ResNet18 (transferability experiments) Framework: PyTorch Threat Models: ℓ∞ and ℓ₂ Defense: PGD-based Adversarial Training Evaluation Metric: Standard and Robust Accuracy Keywords Adversarial Machine Learning, Adversarial Robustness, FGSM, PGD, Carlini-Wagner Attack, DeepFool, Adversarial Training, CNN, MNIST, PyTorch, Cybersecurity, Artificial Intelligence.

Empirical Evaluation of Adversarial Attack Families on CNN Classifiers: FGSM, PGD, Carlini-Wagner, and DeepFool under ℓ∞ and ℓ₂ Perturbation Budgets

Key Points

Abstract

Cite This Study