Phishing emails represent a critical cybersecurity challenge, exploiting human vulnerability to illicitly obtain sensitive personal and financial data. This study proposes a novel content‐based filtering framework for phishing email detection, employing Global Vectors for Word Representation (GloVe) embeddings for semantic feature extraction and a stacked ensemble machine learning architecture for classification. Four publicly available datasets were systematically processed and amalgamated to ensure diversity and robustness in model training. The proposed hierarchical architecture integrates three distinct base learners, K‐nearest neighbors (KNNs), light gradient boosting machine (LGBM), and extremely randomized trees (Extra Trees) classifier (ETC), with extreme gradient boosting (XGBoost) optimized as the meta‐classifier. Comparative experimental analyses demonstrate that this ensemble model achieves superior performance relative to existing methodologies. Empirical evaluations reveal state‐of‐the‐art metrics, including 99.84% accuracy, 99.87% precision, 99.74% recall, and a 99.81% F 1‐score, coupled with a minimal error rate of 0.16%. The results substantiate the framework’s efficacy in mitigating phishing threats, exhibiting statistically significant improvements in both discriminative capability and operational reliability compared to prior approaches. This advancement underscores the potential of stacked ensemble learning combined with semantic feature representation to address evolving cybersecurity vulnerabilities.
Maeli et al. (Thu,) studied this question.