What question did this study set out to answer?

The research aims to optimize phishing detection systems by improving feature selection techniques.

March 7, 2026Open Access

A novel framework for phishing detection based on backward recursive feature selection

Key Points

The research aims to optimize phishing detection systems by improving feature selection techniques.
Developed a structured method combining email parsing and hashing-based feature extraction.
Identified key elements such as linguistic patterns and URLs during the parsing phase.
Applied a hybrid feature selection method merging Recursive Feature Elimination with Backward Elimination.
Created the Recursive Backward Elimination framework for improved machine learning model efficiency.
Achieved peak accuracy of 90.7% with the Ensemble technique.
Showed significant improvements across various classifiers.
Enhanced detection precision while reducing computational expenses.

Abstract

Feature selection plays a crucial role in enhancing the accuracy and efficiency of phishing detection systems. Various innovative approaches have been proposed to optimize feature selection, which is essential for improving model performance and reducing false positives. This study presents a structured method that combines email parsing with hashing-based feature extraction to enhance the accuracy of phishing detection models. During the parsing phase, it identifies key elements like linguistic patterns, metadata, embedded URLs, and attachments, ensuring that only the most relevant information is used for further analysis. Next, a hashing technique is employed to convert high-dimensional textual data into fixed-size feature vectors, which helps maintain important meanings while simplifying the data. On the other hand, this research presents a hybrid method for feature selection that merges Recursive Feature Elimination (RFE) with Backward Elimination (BE) to improve the efficiency of ML models. The proposed framework, Recursive Backward Elimination (RBE), enhances detection precision, decreases dimensionality, and lowers computational expenses. Results indicate significant improvements across classifiers, with the Ensemble technique reaching the peak accuracy of 90.7%. The RBE is designed to power strong machine learning pipelines that can accurately detect phishing attempts, all while ensuring data privacy and scalability.

A novel framework for phishing detection based on backward recursive feature selection

Key Points

Abstract

Cite This Study