The continuous advancements in natural language processing (NLP) have led to the development of highly effective models such as BERT, RoBERTa, GPT-4, Llama 3, and Gemini. However, adapting these models to specific dialects, especially in languages other than English, remains underexplored, particularly in slang or informal language. Our research evaluates monolingual Spanish models that best fit Peruvian colloquial expressions in response to this need. Our approach involved constructing a specialized dataset of 11,276 manually annotated social media comments, preprocessed to retain the unique features of Peruvian slang. We also expanded the models’ vocabulary using a dictionary of Peruvian slang, ensuring better recognition of local expressions. The dataset was used to fine-tune the models, with RoBERTuito demonstrating superior performance, achieving an F1-score of 0.750, significantly outperforming BETO (0.661), BERTuit (0.700), and RoBERTa-BNE (0.696). This research offers a robust solution for sentiment analysis in Peruvian Spanish. It sets a benchmark for adapting monolingual models to linguistic contexts, with applications extending to other dialects and informal language variants.
Calizaya-Milla et al. (Sat,) studied this question.