Online review platforms have become crucial decision-making tools in the hospitality industry, where automated sentiment analysis and rating prediction offer valuable insights for both businesses and consumers. This study investigates the performance of transformer-based language models for predicting hotel review ratings and examines the impact of oversampling techniques on model accuracy. We introduce a novel dataset of 68,785 English hotel reviews from TripAdvisor (2014-2023) in Turkey. Four transformer models, i.e., BERT, DistilBERT, RoBERTa, and DeBERTa, were systematically compared using multiple perspectives. Results show DeBERTa achieves the highest performance among all evaluated models. Random oversampling (ROS) significantly improved classification performance, with F1-scores increasing from 62% to 81% and accuracy from 76% to over 82% across all models. The oversampling approach effectively addressed class imbalance while preserving semantic information, enabling better distinction between rating categories. Through quantitative and qualitative analysis, including the embedding of visualization and SHAP-based interpretability studies, we demonstrate that transformer models effectively capture sentiment patterns. However, they remain sensitive to mixed sentiments and linguistic subtleties. This work contributes a novel dataset, a systematic comparison of four transformer models, and empirical evidence of oversampling effectiveness in sentiment analysis.
Topçu et al. (Mon,) studied this question.