May 11, 2024Open Access

Comparing ChatGPT and Humans on World Knowledge and Common-sense Reasoning Tasks: A case study of the Japanese Winograd Schema Challenge

Key Points

Key points are not available for this paper at this time.

Abstract

Large Language Models (LLMs) like ChatGPT have great potential to influence the way people search for information and make decisions. Those effects are already felt in many fields, so it's necessary to get a better understanding of how they work and how their linguistic competencies and decision-making heuristics compare to humans. Benchmark tests on these skills have been primarily developed in English. Due to disparities in training data in different languages, LLM performance in English may not necessarily translate to other languages. It is essential to expand the domain of evaluation beyond English. In this study we evaluate ChatGPT performance on world knowledge and commonsense reasoning tasks vis-à-vis human performance in Japanese. Our results show that ChatGPT achieves lower accuracy compared to humans. We also report differences in naturalness judgements. These results shed light on the differences between humans and LLMs and can inform future studies.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper