Large Language Models (LLMs) have revolutionized Natural Language Processing, including machine translation (MT), achieving unprecedented performance. However, this progress masks underlying asymmetries in training data and model architecture that impact multilingual translation quality. This paper introduces LingualX64, a novel dataset spanning 64 languages, designed to evaluate the extent to which these asymmetries affect LLM translation performance, particularly under zero-shot conditions. LingualX64 is constructed to minimize data overlap with existing LLM training corpora and to provide a balanced representation of diverse linguistic features, enabling a more robust assessment of cross-linguistic generalization. Our evaluation reveals significant performance disparities across languages, highlighting the impact of data scarcity and linguistic complexity on translation quality. These findings underscore the need for strategies to mitigate asymmetries in LLM training and model design to achieve more equitable and robust multilingual translation capabilities. LingualX64 provides a valuable benchmark for researchers and developers seeking to address these challenges and unlock the full potential of LLMs for global communication.
Huang et al. (Sun,) studied this question.