What question did this study set out to answer?

This research aims to enhance phishing email detection through a content-based filtering framework and advanced machine learning techniques.

March 25, 2026Open Access

Detecting Phishing Emails Using GloVe Word Embeddings and Ensemble Machine Learning Model

Key Points

This research aims to enhance phishing email detection through a content-based filtering framework and advanced machine learning techniques.
Developed a content-based filtering framework for phishing detection.
Utilized GloVe embeddings for semantic feature extraction.
Employed a stacked ensemble machine learning model with KNN, LGBM, Extra Trees, and XGBoost.
Processed and mixed four diverse datasets for robust model training.
Achieved 99.84% accuracy in phishing detection.
Obtained 99.87% precision and 99.74% recall rates.
Achieved an F1-score of 99.81% with a minimal error rate of 0.16%.
Demonstrated superior performance compared to previous methodologies.

Abstract

Phishing emails represent a critical cybersecurity challenge, exploiting human vulnerability to illicitly obtain sensitive personal and financial data. This study proposes a novel content‐based filtering framework for phishing email detection, employing Global Vectors for Word Representation (GloVe) embeddings for semantic feature extraction and a stacked ensemble machine learning architecture for classification. Four publicly available datasets were systematically processed and amalgamated to ensure diversity and robustness in model training. The proposed hierarchical architecture integrates three distinct base learners, K‐nearest neighbors (KNNs), light gradient boosting machine (LGBM), and extremely randomized trees (Extra Trees) classifier (ETC), with extreme gradient boosting (XGBoost) optimized as the meta‐classifier. Comparative experimental analyses demonstrate that this ensemble model achieves superior performance relative to existing methodologies. Empirical evaluations reveal state‐of‐the‐art metrics, including 99.84% accuracy, 99.87% precision, 99.74% recall, and a 99.81% F 1‐score, coupled with a minimal error rate of 0.16%. The results substantiate the framework’s efficacy in mitigating phishing threats, exhibiting statistically significant improvements in both discriminative capability and operational reliability compared to prior approaches. This advancement underscores the potential of stacked ensemble learning combined with semantic feature representation to address evolving cybersecurity vulnerabilities.

Detecting Phishing Emails Using GloVe Word Embeddings and Ensemble Machine Learning Model

Key Points

Abstract

Cite This Study