March 3, 2026Open Access

Enhancing Machine Learning Algorithms for Imbalanced Data—A Case Study: Spam Detection

Puntos clave

Achieving 99.3% accuracy illustrates the significant role of data balancing in spam detection performance.
The best results emerged from combining the SMOTEENN resampling method with LinearSVC for classification.
Comprehensive analysis employed various feature extraction methods and machine learning algorithms to enhance detection.
Findings indicate the proposed unified framework is effective in addressing challenges posed by imbalanced datasets.

Resumen

Nowadays, emails are among the most common and cost-effective ways to share and exchange information. However, because of their simplicity, they are highly vulnerable to various threats. In many cases, users can become victims of spam, irrelevant and unwanted messages or emails that may cause several issues and security risks. Detecting spam has been the subject of many studies, and researchers continue to seek more effective detection methods. In this work, we propose the use of machine learning algorithms to automatically detect spam, focusing on a novel unified framework that integrates multiple data balancing techniques and feature extraction methods to improve classification accuracy on imbalanced datasets. Several feature extraction methods, data balancing techniques, and machine learning algorithms were tested in this study to determine the most effective combination. The best performance was achieved using the SMOTEENN resampling method combined with LinearSVC, reaching an accuracy of 99.3%. These results demonstrate that balancing methods significantly influence spam detection performance and validate the effectiveness of the proposed framework in improving machine learning–based email classification.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo