The strategic significance of Large Language Models (LLMs) in economic expansion, innovation, societal development, and national security has been increasingly recognized since the advent of ChatGPT. This study provides a comprehensive comparative evaluation of LLMs developed in the U.S. and China, in both English and Chinese contexts. We proposed an evaluation framework that encompasses natural language proficiency, disciplinary expertise, and safety and responsibility, and systematically assessed notable models from the U.S. and China under various operational tasks and scenarios. Our key findings show that GPT-4 Turbo leads in English contexts, whereas the Chinese LLM Ernie-Bot 4 stands out in Chinese contexts. The study also highlights disparities in LLM performance across languages and tasks, stressing the necessity for linguistically and culturally nuanced model development. The complementary strengths of LLMs developed in the U.S. and China highlight the cross-national collaboration value in advancing LLM technology. The research delineates the current LLM competition landscape and offers valuable insights for policymakers and businesses regarding strategic LLM investments and development.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jiaxin Li
Zhenhui Jiang
Yang Liu
ACM Transactions on Management Information Systems
University of Hong Kong
Xi'an Jiaotong University
Building similarity graph...
Analyzing shared references across papers
Loading...
Li et al. (Tue,) studied this question.
www.synapsesocial.com/papers/68d6d8768b2b6861e4c3e7a3 — DOI: https://doi.org/10.1145/3769086