August 31, 2024Open Access

Efficient clustering of e-mails by applying supervised machine learning algorithms.

Key Points

Key points are not available for this paper at this time.

Abstract

In today’s digital age, effective detection of unwanted emails, commonly known as ”spam”,has become a priority for individuals and organisations alike. As email inboxes fill up withunsolicited messages, it has become evident that the predefined rules and heuristics used bytraditional spam filters have lost their effectiveness. This persistent problem poses challengesat both the personal and business level.Despite efforts to protect email accounts with anti-virus, which in many cases come at a cost,spam remains a growing concern. For businesses, implementing costly firewalls can be an un-necessary burden. The problem of spam persists, and its impact on the efficiency and securityof email communication is indisputable.The primary objective of this paper is to investigate and evaluate machine learning algorithmsspecifically designed to address the challenge of automatic spam detection. This is achieved byusing text classification techniques applied to mail servers and personal computers. In particu-lar, three key algorithms are examined: Random Forest, Decision Tree and Naive Bayes, withthe intention of determining their applicability in both environments.This study relies on two essential research methodologies. First, feature selection, a crucialprocess that identifies the most relevant variables in mail classification, including keywordsand word frequencies, is carried out. In addition, performance evaluation, which uses metricssuch as accuracy, recall and F1-score, is employed to understand the performance of MachineLearning models in detecting spam and legitimate emails.The results of this study are presented in the form of comparative tables showing the hit andmiss rates of the three models evaluated. Notably, it is determined that the Random Forestmodel, when applied in conjunction with tokenisation techniques, exhibits superior efficiencycompared to the other two models.The choice of the right Machine Learning model is critical to ensure efficiency in email classifica-tion, and this study provides a solid basis for making informed decisions in the implementationof email security systems in real world business environments. Spam detection, supported bymachine learning algorhythms, remains an evolving field and offers a promising solution toaddress a persistent problem in the digital world.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Daniel Iván Quirumbay Yagual

B. Mendez

Victor Ruiz

Journals

Journal of Applied Research and Technology

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Efficient clustering of e-mails by applying supervised machine learning algorithms.

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study