What does this research mean for the field?

Among transformer-based language models, DeBERTa achieves the highest performance for predicting hotel review ratings, and applying random oversampling significantly improves classification accuracy and F1-scores by effectively addressing class imbalance. Novelty: ClaimNovelty.INCREMENTAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This research aims to improve the prediction of hotel review ratings using transformer models and assess the impact of oversampling techniques.

June 3, 2026Open Access

Improving Hotel Review Rating Prediction with Transformer Models

Key Points

This research aims to improve the prediction of hotel review ratings using transformer models and assess the impact of oversampling techniques.
Systematically compared four transformer models: BERT, DistilBERT, RoBERTa, and DeBERTa.
Novel dataset of 68,785 English hotel reviews from TripAdvisor (2014-2023) in Turkey.
Applied random oversampling techniques to enhance model accuracy and address class imbalance.
DeBERTa achieved the highest performance among all models evaluated.
F1-scores improved from 62% to 81% with random oversampling, and accuracy increased from 76% to over 82%.
Transformers captured sentiment patterns effectively but showed sensitivity to mixed sentiments and linguistic nuances.

Abstract

Online review platforms have become crucial decision-making tools in the hospitality industry, where automated sentiment analysis and rating prediction offer valuable insights for both businesses and consumers. This study investigates the performance of transformer-based language models for predicting hotel review ratings and examines the impact of oversampling techniques on model accuracy. We introduce a novel dataset of 68,785 English hotel reviews from TripAdvisor (2014-2023) in Turkey. Four transformer models, i.e., BERT, DistilBERT, RoBERTa, and DeBERTa, were systematically compared using multiple perspectives. Results show DeBERTa achieves the highest performance among all evaluated models. Random oversampling (ROS) significantly improved classification performance, with F1-scores increasing from 62% to 81% and accuracy from 76% to over 82% across all models. The oversampling approach effectively addressed class imbalance while preserving semantic information, enabling better distinction between rating categories. Through quantitative and qualitative analysis, including the embedding of visualization and SHAP-based interpretability studies, we demonstrate that transformer models effectively capture sentiment patterns. However, they remain sensitive to mixed sentiments and linguistic subtleties. This work contributes a novel dataset, a systematic comparison of four transformer models, and empirical evidence of oversampling effectiveness in sentiment analysis.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Topçu et al. (Mon,) studied this question.

synapsesocial.com/papers/6a1fc6f7dee9eb8c0dce7d09 https://doi.org/https://doi.org/10.35377/saucis...1748175

Bookmark

View Full Paper