What type of study is this?

This is a Literature Review study.

August 21, 2025

Comparison of Machine Learning for Sentiment Analysis in Movie Reviews

MJMirza JahanzaibUniversity of Engineering and Technology Lahore MSMd. Shafiur Raihan ShafiUttara University SMSaeed Hossain Moheb

Key Points

BERT achieved superior accuracy and F1-score in sentiment analysis across all movie review datasets, while Logistic Regression reached 88% accuracy on IMDb.
The analysis involved six classifiers, including Naïve Bayes and Random Forest, evaluated across three benchmark datasets for performance metrics.
Preprocessing steps included tokenization and feature extraction methods like Count Vectorizer and TF-IDF to prepare movie reviews for analysis.
Findings highlight that dataset characteristics significantly influence classification performance in real-world sentiment analysis applications.

Abstract

This study presents a comparative analysis of machine learning and deep learning algorithms for sentiment classification in movie reviews. Three benchmark datasets—IMDb (50K and 20K reviews) and Rotten Tomatoes—were used to evaluate six classifiers: Na¨ıve Bayes, Logistic Regression, Support Vector Machine, Decision Tree, Random Forest, and BERT. Preprocessing included tokenization, stop-word removal, stemming, and feature extraction using Count Vectorizer and TF-IDF. Evaluation metrics such as accuracy, precision, recall, sensitivity, specificity, and F1-score were used to assess model performance. Logistic Regression achieved 88% accuracy on the IMDb dataset, while Random Forest exhibited the highest specificity. BERT outperformed traditional models in both accuracy and F1-score across all datasets, particularly in handling informal and context-heavy language. The results highlight the impact of dataset characteristics on classification performance and provide insights for deploying sentiment analysis in real-world applications like recommendation systems and audience profiling.

Ask AI

Helpful

Bookmark

View Full Paper