What question did this study set out to answer?

This research aims to improve phishing email detection using an AI-powered model integrating Natural Language Processing techniques.

June 20, 2026Open Access

AI-Powered Threat Hunting for Email Phishing Attack Detection Using Natural Language Processing (NLP)

Puntos clave

This research aims to improve phishing email detection using an AI-powered model integrating Natural Language Processing techniques.
Developed PhishGuard AI application using Natural Language Processing for phishing email detection.
Leveraged Word2Vec and TF-IDF weighting for feature extraction along with XGBoost classifier.
Tested the model's robustness and generalisability using CEAS_08.csv and SpamAssasin.csv datasets.
Achieved high computational efficiency and effectiveness in phishing email detection processes.
Demonstrated proactive threat-hunting capabilities, significantly reducing false positive rates.
Contributed to the development of a more robust cybersecurity solution for safeguarding information.

Resumen

Phishing attacks remain as a significant cybersecurity threat, aiming to steal sensitive information by exploiting human vulnerability. Traditional phishing email detection often struggles to keep up with the latest attack strategies developed by the attackers which results in high false positive rates and the limited contextual understanding on the email contents. Therefore, to address these challenges, this research proposes a solution via an AI-powered threat-hunting model integrating Natural Language Processing (NLP) techniques for phishing email detection in English through the development of PhishGuard AI application. The application is developed as a web-based software solution designed to be accessible to both users with and without technical expertise. The model leverages Word2Vec with TF-IDF weighting for feature extraction and uses an XGBoost classifier. A comprehensive testing process using various metrics will evaluate the computational efficiency and effectiveness of the model. The model's robustness and generalisability were rigorously tested using two distinct datasets which are CEAS₀8. csv for in-distribution training and SpamAssasin. csv for out-of-distribution evaluation. The primary value of this model lies in its proactive threat-hunting capability, which distinguishes it from reactive systems that rely on known threat examples. The findings derived from the study aim to enhance to the domain of phishing email detection and contributing to the development of a more robust cybersecurity solution that can help in safeguarding both the individuals and organisations safety in our country.

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Kumaresan et al. (Sun,) studied this question.

synapsesocial.com/papers/6a362de1db0793dc1a535e9a https://doi.org/https://doi.org/10.33093/jiwe.2026.5.2.10

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo