This survey presents a comprehensive review of Large Language Models (LLMs) developed for the Arabic language and its dialects. It categorizes models by architecture (encoder-only, decoder-only, and encoder-decoder) and by linguistic form, including Classical Arabic, Modern Standard Arabic, and Dialectal Arabic. We analyze monolingual, bilingual, and multilingual models, evaluating their performance on tasks such as sentiment analysis, named entity recognition, and question answering. The survey also assesses model openness, considering factors like access to source code, training data, weights, and documentation. Our findings highlight a concentration of resources on MSA, a lack of diverse dialectal datasets, and limited transparency across many models. This work offers the first systematic comparison of openness and linguistic coverage in Arabic LLMs and outlines key challenges and research opportunities to support more inclusive, reproducible, and representative Arabic NLP.
Building similarity graph...
Analyzing shared references across papers
Loading...
Malak Mashabi
Shahad Al-Khalifa
Hend Al-Khalifa
ACM Transactions on Asian and Low-Resource Language Information Processing
King Saud University
Building similarity graph...
Analyzing shared references across papers
Loading...
Mashabi et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69d9e62078050d08c1b765bd — DOI: https://doi.org/10.1145/3807946
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: