What question did this study set out to answer?

This research aims to evaluate and compare the effectiveness of traditional machine learning techniques with Large Language Models (LLMs) in classifying SMS spam.

June 18, 2026Open Access

Future SMS spam filtering: comparative fine-tuning of machine learning and LLMs

Key Points

This research aims to evaluate and compare the effectiveness of traditional machine learning techniques with Large Language Models (LLMs) in classifying SMS spam.
Studied SMS spam classification using a comprehensive dataset of diverse samples.
Fine-tuned three LLM models: Phi-3.5 Classifier, H2O-Danube, and DistilBERT for performance optimization.
Implemented preprocessing techniques including tokenization, case transformation, and stopword filtering.
Phi-3.5 Classifier and H2O-Danube achieved 99% across accuracy, precision, recall, and F1-scores.
DistilBERT also achieved similar performance metrics of 99%.
LLMs significantly surpassed traditional methods like SVM, DT, and NB in accuracy.

Abstract

Classifying short message service (SMS) spam is critical for identifying unauthorized and potentially harmful messages, especially given the increasing number of crimes associated with such communications. This study compares the effectiveness of Large Language Models (LLMs) with traditional machine-learning techniques in spam SMS classification. The results demonstrate that LLMs outperform commonly used traditional methods, including Support Vector Machine (SVM), Decision Tree (DT), and Naïve Bayes (NB), setting this research apart from prior work. To ensure robust evaluation, this study utilizes a comprehensive dataset comprising diverse SMS spam samples alongside preprocessing techniques such as tokenization, case transformation, and stopword filtering (in English). Three LLM models—Phi-3.5 Classifier, H2O-Danube, and DistilBERT—were fine-tuned to optimize performance. Experimental results revealed that the Phi-3.5 Classifier and H2O-Danube achieved identical performance metrics of accuracy, precision, recall, and F1-scores with 99%. The DistilBERT model also performed exceptionally well, achieving 99% across these metrics. These results significantly surpass those obtained from traditional machine learning models, highlighting the superior accuracy of LLMs in spam classification. The findings have profound implications for integrating LLM Models to enhance the performance of sentiment analysis, improve spam detection systems, compare and establish performance benchmarks by leveraging LLMs for sentiment analysis in SMS spam detection, which can enhance SMS communication security, and increasing the overall efficiency of spam mitigation strategies.

Mark Helpful

Bookmark

Relay

View Full Paper