Feature selection plays a crucial role in enhancing the accuracy and efficiency of phishing detection systems. Various innovative approaches have been proposed to optimize feature selection, which is essential for improving model performance and reducing false positives. This study presents a structured method that combines email parsing with hashing-based feature extraction to enhance the accuracy of phishing detection models. During the parsing phase, it identifies key elements like linguistic patterns, metadata, embedded URLs, and attachments, ensuring that only the most relevant information is used for further analysis. Next, a hashing technique is employed to convert high-dimensional textual data into fixed-size feature vectors, which helps maintain important meanings while simplifying the data. On the other hand, this research presents a hybrid method for feature selection that merges Recursive Feature Elimination (RFE) with Backward Elimination (BE) to improve the efficiency of ML models. The proposed framework, Recursive Backward Elimination (RBE), enhances detection precision, decreases dimensionality, and lowers computational expenses. Results indicate significant improvements across classifiers, with the Ensemble technique reaching the peak accuracy of 90.7%. The RBE is designed to power strong machine learning pipelines that can accurately detect phishing attempts, all while ensuring data privacy and scalability.
Elsoud et al. (Sun,) studied this question.