What question did this study set out to answer?

This research aims to evaluate and compare various machine learning models for spam detection across different digital communication systems.

February 26, 2026Open Access

Machine Learning Based Spam Detection in Digital Communication Systems: A Comparative Analysis

Key Points

This research aims to evaluate and compare various machine learning models for spam detection across different digital communication systems.
Conducted comparative analysis of several basic machine learning models
Utilized popular benchmark datasets from multiple communication platforms
Assessed accuracy, precision, recall, and F1-score of the models
Tested model performance on unseen datasets
ML models achieved good accuracy, precision, recall, and F1-score on benchmark datasets
Notable performance drop was observed when models were tested on unseen datasets
Recommendations for enhancements were proposed to improve performance on unseen data

Abstract

Spam messages are unwanted, irrelevant, or potentially harmful messages sent in bulk to large numbers of recipients via email, SMS, or social media. These messages pose a threat of spam to individual users and commercial companies. They threaten digital communication platforms by enabling phishing, malware distribution, service disruption, and unsolicited advertisements. Several mechanisms have been used in the literature to detect spam over digital communication systems. This includes rule-based filtering, Bayesian filtering, heuristic analysis, and machine learning (ML) techniques. Traditional rule-based and heuristic analyses were insufficient to cope with evolving attack patterns. Meanwhile, ML models can present modern, dynamic, appropriate, and efficient solutions in this manner. This study aims to evaluate and compare several basic ML models for spam detection, considering popular benchmark datasets on several communication platforms as a comprehensive comparative study. The experimental results demonstrate that the tested models achieve good accuracy, precision, recall, and F1-score on each investigated benchmark dataset. However, the performance of all models has decreased drastically when the trained models are tested on an unseen dataset. Recommendations for future required enhancements to handle this reduction in the performance of ML techniques for unseen datasets are provided. Finally, extra experimental tests have shown the positive impact of applying some of these recommendations.

Bookmark

View Full Paper

Bookmark

View Full Paper

Machine Learning Based Spam Detection in Digital Communication Systems: A Comparative Analysis

Key Points

Abstract

Cite This Study