For users who rely on single-use mobile phones, the global problem of receiving unwanted marketing messages through SMS remains a significant concern. In recent years, extensive use of machine learning and deep learning approaches has been explored to address this challenge. To improve predictive accuracy, the outputs of multiple models were combined using a majority-voting strategy. This work presents a comparative analysis of several text classification techniques, highlighting the importance of reliably identifying and labeling spam SMS messages. After data preprocessing, messages were transformed into numerical representations using TF-IDF, which emphasizes uncommon but informative terms over frequent ones. Among the tested methods, the Relevance Vector Machine achieved the strongest performance in the data, reaching an F1 is 0.975176. In addition, this examined alternative spam detection algorithms, including Logistic Regression, XGBoost, and LightGBM. The preprocessing pipeline included duplicate removal, text normalization with spaCy, label encoding, and TF-IDF vectorization. Two experimental conditions were evaluated: one without handling class imbalance and another with imbalance adjustment. Results showed that ensemble-based methods, particularly Gradient Boosting, XGBoost, and LightGBM, consistently delivered superior performance. Under imbalanced data conditions, both XGBoost and LightGBM achieved F1 scores of 0.99 across the majority and minority classes. When class imbalance was corrected, their performance remained strong, with F1 scores of 0.98 for all classes. Logistic Regression also demonstrated robust results, confirming its role as a reliable baseline. Overall, the findings indicate that the proposed RVM framework is effective for SMS spam classification and has practical applicability in real-world scenarios.
Abdel-aziem et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: