Empirical Evaluation of Adversarial Attack Families on CNN Classifiers Overview This preprint presents a systematic empirical evaluation of four canonical adversarial attack algorithms against convolutional neural network (CNN) classifiers: Fast Gradient Sign Method (FGSM) Projected Gradient Descent (PGD) Carlini-Wagner L₂ (C&W) DeepFool The study evaluates model robustness under both ℓ∞ and ℓ₂ threat models across multiple perturbation budgets and investigates the effectiveness of adversarial training as a defense mechanism. Abstract Standard evaluation of machine learning models measures accuracy on clean test data—a metric that collapses under adversarial perturbations. This paper presents a systematic empirical evaluation of four canonical adversarial attack algorithms—Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), Carlini-Wagner L₂ (C&W), and DeepFool—applied to convolutional neural network classifiers on the MNIST dataset. We evaluate robustness degradation across perturbation budgets ε ∈ 0. 01, 0. 03, 0. 10 under ℓ∞ and ℓ₂ threat models, quantifying the standard-versus-robust accuracy gap. We further evaluate adversarial training (Madry et al. , 2018) as a certified defense mechanism, measuring the robustness–accuracy tradeoff post-hardening across all attack families. Results show that PGD at ε = 0. 03 is the strongest white-box first-order attack, reducing baseline CNN accuracy from 99% to 50%. Adversarial training achieves 72% robust accuracy against PGD at ε = 0. 03 at a cost of 6 percentage points on clean accuracy. All implementations are developed from scratch in pure PyTorch without Foolbox or ART dependencies. Key Contributions From-scratch implementations of FGSM, PGD, Carlini-Wagner, and DeepFool attacks. Empirical robustness evaluation across multiple perturbation budgets. Quantitative assessment of adversarial training defenses. Reproducible experimental framework for adversarial robustness research. Open-source implementation designed for robustness auditing and educational use. Experimental Setting Dataset: MNIST Models: Custom CNN, ResNet18 (transferability experiments) Framework: PyTorch Threat Models: ℓ∞ and ℓ₂ Defense: PGD-based Adversarial Training Evaluation Metric: Standard and Robust Accuracy Keywords Adversarial Machine Learning, Adversarial Robustness, FGSM, PGD, Carlini-Wagner Attack, DeepFool, Adversarial Training, CNN, MNIST, PyTorch, Cybersecurity, Artificial Intelligence.
Aaryan Paliwal (Sat,) studied this question.