What type of study is this?

This is a Qualitative Study study.

November 30, 2025Open Access

Transformer and Pre-Transformer Model-Based Sentiment Prediction with Various Embeddings: A Case Study on Amazon Reviews

Key Points

Sentiment analysis reveals deep learning enhances predictive accuracy in Amazon reviews, outpacing machine learning methods.
Key metrics include accuracy at 92% and categorical cross-entropy of 0.25, demonstrating effective model calibration and performance.
Evaluation compared embedding techniques like GloVe and FastText across datasets, highlighting subword-level semantic richness.
Findings support sustainable AI practices by offering scalable models for real-world application constraints.

Abstract

Sentiment analysis is essential for understanding consumer opinions, yet selecting the optimal models and embedding methods remains challenging, especially when handling ambiguous expressions, slang, or mismatched sentiment–rating pairs. This study provides a comprehensive comparative evaluation of sentiment classification models across three paradigms: traditional machine learning, pre-transformer deep learning, and transformer-based models. Using the Amazon Magazine Subscriptions 2023 dataset, we evaluate a range of embedding techniques, including static embeddings (GloVe, FastText) and contextual transformer embeddings (BERT, DistilBERT, etc.). To capture predictive confidence and model uncertainty, we include categorical cross-entropy as a key evaluation metric alongside accuracy, precision, recall, and F1-score. In addition to detailed quantitative comparisons, we conduct a systematic qualitative analysis of misclassified samples to reveal model-specific patterns of uncertainty. Our findings show that FastText consistently outperforms GloVe in both traditional and LSTM-based models, particularly in recall, due to its subword-level semantic richness. Transformer-based models demonstrate superior contextual understanding and achieve the highest accuracy (92%) and lowest cross-entropy loss (0.25) with DistilBERT, indicating well-calibrated predictions. To validate the generalisability of our results, we replicated our experiments on the Amazon Gift Card Reviews dataset, where similar trends were observed. We also adopt a resource-aware approach by reducing the dataset size from 25 K to 20 K to reflect real-world hardware constraints. This study contributes to both sentiment analysis and sustainable AI by offering a scalable, entropy-aware evaluation framework that supports informed, context-sensitive model selection for practical applications.

Ask AI

Helpful

Bookmark

View Full Paper