What question did this study set out to answer?

This research aims to develop and evaluate a Bidirectional LSTM model for spam detection and sentiment analysis in SMS and email communications.

March 28, 2026

Bidirectional LSTM for Spam Detection and Sentimental Analysis

Key Points

This research aims to develop and evaluate a Bidirectional LSTM model for spam detection and sentiment analysis in SMS and email communications.
Developed a Bidirectional Long Short-Term Memory (BiLSTM) deep learning model.
Preprocessing included stemming, tokenization, and stop-word removal.
Utilized Word2Vec for feature extraction and sentiment analysis with AFINN and SentiWordNet lexicons.
Compared performance against a Hybrid K-Nearest Neighbors and Support Vector Machine classifier.
BiLSTM achieved an accuracy of 98.77% on the SpamAssassin dataset and 99.11% on the Email dataset.
Outperformed the hybrid KNN-SVM model significantly in accuracy, recall, F1-score, Kappa statistics, MAE, and RMSE across all datasets.

Abstract

Short Message Service (SMS) and email communication have become primary vectors for spam, placing heavy burdens on users and mobile network operators. This paper proposes a Bidirectional Long Short-Term Memory (BiLSTM) deep learning model for spam detection and sentiment analysis, evaluated on three benchmark datasets: SpamAssassin, SMS, and Email. The model is compared against a Hybrid K-Nearest Neighbors and Support Vector Machine (Hybrid KNN-SVM) classifier from the prior literature. Preprocessing involves stemming, tokenization, and stop-word removal, followed by Word2Vec-based feature extraction. The BiLSTM network captures both past and future contextual information in text sequences, substantially outperforming the hybrid baseline. On the SpamAssassin dataset, BiLSTM achieves an accuracy of 98.77%, and on the Email dataset it reaches 99.11%. Sentiment polarity is classified using AFINN and SentiWordNet lexicons. Experimental results confirm that the proposed BiLSTM model yields superior accuracy, recall, F1-score, Kappa statistics, MAE, and RMSE across all three datasets.

Mark Helpful

Bookmark

Relay

View Full Paper