What type of study is this?

This is a Quantitative Study study.

September 24, 2025Open Access

AI Development and Innovation: A Comparison of Large Language Models from the U.S. and China

Key Points

GPT-4 Turbo outperforms in English contexts, while Ernie-Bot 4 excels in Chinese situations.
An evaluation framework examined natural language proficiency, disciplinary expertise, and safety across models.
Significant performance disparities between models emphasize the need for culturally aware LLM development.
The study underscores the importance of cross-national collaboration in advancing language model technology.

Abstract

The strategic significance of Large Language Models (LLMs) in economic expansion, innovation, societal development, and national security has been increasingly recognized since the advent of ChatGPT. This study provides a comprehensive comparative evaluation of LLMs developed in the U.S. and China, in both English and Chinese contexts. We proposed an evaluation framework that encompasses natural language proficiency, disciplinary expertise, and safety and responsibility, and systematically assessed notable models from the U.S. and China under various operational tasks and scenarios. Our key findings show that GPT-4 Turbo leads in English contexts, whereas the Chinese LLM Ernie-Bot 4 stands out in Chinese contexts. The study also highlights disparities in LLM performance across languages and tasks, stressing the necessity for linguistically and culturally nuanced model development. The complementary strengths of LLMs developed in the U.S. and China highlight the cross-national collaboration value in advancing LLM technology. The research delineates the current LLM competition landscape and offers valuable insights for policymakers and businesses regarding strategic LLM investments and development.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper