What question did this study set out to answer?

This research aims to improve sentiment analysis and topic modeling using RoBERTa-Large to better understand consumer behavior in online shopping.

April 27, 2026Open Access

Analyzing digital consumer insights through RoBERTa LLM based sentiment analysis and topic modeling

Puntos clave

This research aims to improve sentiment analysis and topic modeling using RoBERTa-Large to better understand consumer behavior in online shopping.
Utilized RoBERTa-Large for sentiment classification on consumer reviews.
Implemented Latent Dirichlet Allocation (LDA) for topic modeling on public datasets.
Compared RoBERTa with traditional machine learning models and deep learning architectures for performance evaluation.
RoBERTa-Large achieved the highest accuracy of 93.59%, outperforming all baseline models.
Sentiment analysis captured complex semantic relationships, improving understanding of consumer sentiments.
SHAP and LIME techniques enhanced transparency and trustworthiness of model predictions.

Resumen

Understanding consumer behavior in the context of online shopping is critical for businesses to adapt to evolving market trends. Customer reviews serve as a rich source of information reflecting consumer sentiments and preferences. Sentiment analysis of these reviews has become a powerful tool to uncover underlying consumer emotions and purchasing trends. However, traditional methods relying on shallow lexical features and classical machine learning algorithms often fall short in capturing the intricate and contextual patterns present in textual data. In this study, we propose the use of the large language model RoBERTa-Large to enhance sentiment classification performance by imposing its advanced contextual embeddings and attention mechanisms. This approach enables the capture of complex semantic relationships beyond surface-level word frequencies. Alongside sentiment analysis, we apply topic modeling using Latent Dirichlet Allocation (LDA) on publicly available datasets to identify prevalent themes and topics within consumer feedback. We perform a comprehensive comparison of RoBERTa against traditional machine learning and ensemble models using TF-IDF features, as well as deep learning architectures utilizing sentence embeddings and transformer-based models. Experimental results demonstrate that RoBERTa-Large achieves the highest accuracy of 93.59%, significantly outperforming baseline models. To enhance model transparency and trustworthiness, we apply SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) interpretability techniques, providing meaningful explanations of model predictions at both global and local levels.

Me gusta

Guardar

Ver artículo completo