This paper presents an adversarial testing framework for evaluating the robustness of machine learning-based phishing detection systems against AI-generated adversarial email content. Two TF-IDF-based classifiers Logistic Regression and Support Vector Machine are evaluated on a balanced dataset of 1,600 emails under both standard and adversarially transformed conditions. Adversarial samples are generated through synonym substitution and structural camouflage techniques. Results demonstrate statistically significant performance degradation under adversarial conditions, with a critical asymmetric failure mode identified: both classifiers maintain a 0.0% false positive rate while detection sensitivity is substantially reduced, creating a silent evasion channel in deployed systems. Statistical validation is performed using McNemar's test and bootstrap confidence intervals. This work establishes adversarial robustness evaluation as a necessary component of phishing detection assessment, particularly as AI-generated content increasingly characterises real-world attack vectors.
Arnika Madushani Rangalla (Mon,) studied this question.