What type of study is this?

This is a Quantitative Study study.

October 16, 2025Open Access

Evaluating Large Language Models for Brazilian Portuguese Sentiment Analysis: A Comparative Study of Multilingual State-of-the-Art vs. Brazilian Portuguese Fine-Tuned LLMs

Key Points

Large-scale models achieved over 92% accuracy, showcasing their effectiveness for sentiment analysis in Brazilian Portuguese.
Thirteen multilingual models and ten fine-tuned models were evaluated across 12 datasets, providing a comprehensive assessment.
Some fine-tuning improved performance and reduced hallucination rates, but results varied significantly across models.
Newer model generations consistently outperformed older versions, highlighting advancements in large language models.

Abstract

This study presents an extensive comparative analysis of Large Language Models (LLMs) for sentiment analysis in Brazilian Portuguese texts. We evaluated 23 LLMs—comprising 13 state-of-the-art multilingual models and 10 models specifically fine-tuned for Portuguese—across 12 public annotated datasets from diverse domains, employing the in-context learning paradigm. Our findings demonstrate that large-scale models such as Claude-3. 5-Sonnet, GPT-4o, DeepSeek-V3, and Sabiá-3 delivered superior results with accuracies exceeding 92%, while smaller models (7-13B parameters) also showed compelling performance with top performers achieving accuracies above 90%. Notably, linguistic specialization through fine-tuning demonstrated mixed results—significantly reducing hallucination rates for some models but not consistently yielding performance improvements across all model types. We also observed that newer model generations frequently outperformed their predecessors, and in the one dataset where traditional machine learning methods were employed by the original authors for sentiment classification, all evaluated LLMs substantially surpassed these traditional approaches. Moreover, smaller-scale models exhibited a tendency toward overgeneration despite explicit instructions. These findings contribute valuable insights to the discourse on language-specific model optimization and establish empirical benchmarks for both multilingual and Portuguese-specialized LLMs in sentiment analysis tasks.

Evaluating Large Language Models for Brazilian Portuguese Sentiment Analysis: A Comparative Study of Multilingual State-of-the-Art vs. Brazilian Portuguese Fine-Tuned LLMs

Key Points

Abstract

Cite This Study