Even though natural language understanding has made significant progress, language models still struggle to grasp sarcasm, a complex linguistic phenomenon that is influenced by cultural and contextual differences. This has even become much worse in multilingual settings. This study assesses the efficacy of base-scale pre-trained models (Bidirectional Encoder Representations from Transformers (BERT), Robustly Optimized BERT Pretraining Approach (RoBERTa), Unsupervised Cross-lingual Representation Learning at Scale (XLM-RoBERTa), and Distilled version of BERT (DistilBERT)) via task-specific fine-tuning, and large language models (LLMs) (GPT-4) in few-shot contexts, across three languages: English, Spanish, and Amharic. While we primarily focus on multilingual sarcasm detection, we also offer monolingual benchmarks to evaluate language-specific adaptations. Among fine-tuned models, RoBERTa-base has gained the highest multilingual generalization (F1: 0.82), while BERT outperforms in English (F1: 0.90), proving the English language adaptability in models. On the other hand, GPT-4o with a few-shot strategy has shown a limitation on sarcasm comprehension (F1: 0.65), even though it is better at interpreting language. This indicates that, although LLMs exhibit greater flexibility, base-scale models refined on task-specific data remain superior in detecting multilingual sarcasm. Finally, we believe this work gives useful tips for choosing a model when resources are limited and shows how important it is to have sarcasm detection systems that can adapt to different cultures.
Building similarity graph...
Analyzing shared references across papers
Loading...
Girma Yohannis Bade
Olga Kolesnikova
José Luis Oropeza
PeerJ Computer Science
Building similarity graph...
Analyzing shared references across papers
Loading...
Bade et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69a288170a974eb0d3c04149 — DOI: https://doi.org/10.7717/peerj-cs.3584