Los puntos clave no están disponibles para este artículo en este momento.
Large Language Models (LLMs) like ChatGPT have great potential to influence the way people search for information and make decisions. Those effects are already felt in many fields, so it's necessary to get a better understanding of how they work and how their linguistic competencies and decision-making heuristics compare to humans. Benchmark tests on these skills have been primarily developed in English. Due to disparities in training data in different languages, LLM performance in English may not necessarily translate to other languages. It is essential to expand the domain of evaluation beyond English. In this study we evaluate ChatGPT performance on world knowledge and commonsense reasoning tasks vis-à-vis human performance in Japanese. Our results show that ChatGPT achieves lower accuracy compared to humans. We also report differences in naturalness judgements. These results shed light on the differences between humans and LLMs and can inform future studies.
Reese et al. (Sat,) studied this question.