February 27, 2024Open Access

Comparative Evaluation of Commercial Large Language Models on PromptBench: An English and Chinese Perspective

Key Points

Key points are not available for this paper at this time.

Abstract

Abstract This study embarks on an exploration of the performance disparities observed between English and Chinese in large language models (LLMs), motivated by the growing need for multilingual capabilities in artificial intelligence systems. Utilizing a comprehensive methodology that includes quantitative analysis of model outputs and qualitative assessment of language nuances, the research investigates the underlying reasons for these discrepancies. The findings reveal significant variations in the performance of LLMs across the two languages, with a pronounced challenge in accurately processing and generating text in Chinese. This performance gap underscores the limitations of current models in handling the complexities inherent in languages with distinct grammatical structures and cultural contexts. The implications of this research are far-reaching, suggesting a critical need for the development of more robust and inclusive models that can better accommodate linguistic diversity. This entails not only the enrichment of training datasets with a wider array of languages but also the refinement of model architectures to grasp the subtleties of different linguistic systems. Ultimately, this study contributes to the ongoing discourse on enhancing the multilingual capabilities of LLMs, aiming to pave the way for more equitable and effective artificial intelligence tools that cater to a global user base.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper

Cite This Study

Wang et al. (Tue,) studied this question.

synapsesocial.com/papers/68e77566b6db6435876e9fa9 https://doi.org/https://doi.org/10.21203/rs.3.rs-3987793/v1

Demander à l'IA

Bookmark

View Full Paper