Nowadays, emails are among the most common and cost-effective ways to share and exchange information. However, because of their simplicity, they are highly vulnerable to various threats. In many cases, users can become victims of spam, irrelevant and unwanted messages or emails that may cause several issues and security risks. Detecting spam has been the subject of many studies, and researchers continue to seek more effective detection methods. In this work, we propose the use of machine learning algorithms to automatically detect spam, focusing on a novel unified framework that integrates multiple data balancing techniques and feature extraction methods to improve classification accuracy on imbalanced datasets. Several feature extraction methods, data balancing techniques, and machine learning algorithms were tested in this study to determine the most effective combination. The best performance was achieved using the SMOTEENN resampling method combined with LinearSVC, reaching an accuracy of 99.3%. These results demonstrate that balancing methods significantly influence spam detection performance and validate the effectiveness of the proposed framework in improving machine learning–based email classification.
Nouha Arfaoui (Thu,) studied this question.