AI-generated phishing emails present a growing cybersecurity threat, exploiting human psychology with high-quality, context-aware language. This paper introduces a novel two-stage detection framework that combines deep learning with psychological analysis to address this challenge. A new dataset containing 2995 GPT-o1-generated phishing emails, each labelled with Cialdini’s six persuasion principles, is created across five organisational sectors—forming one of the largest and most behaviourally annotated corpora in the field. The first stage employs a fine-tuned DistilBERT model to predict the presence of persuasion principles in each email. These confidence scores then feed into a lightweight dense neural network at the second stage for final binary classification. This interpretable design balances performance with insight into attacker strategies. The full system achieves 94% accuracy and 98% AUC, outperforming comparable methods while offering a clearer explanation of model decisions. Analysis shows that principles like authority, scarcity, and social proof are highly indicative of phishing, while reciprocation and likeability occur more often in legitimate emails. This research contributes an interpretable, psychology-informed framework for phishing detection, alongside a unique dataset for future study. Results demonstrate the value of behavioural cues in identifying sophisticated phishing attacks and suggest broader applications in detecting malicious AI-generated content.
Peter Tooher (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: