Key points are not available for this paper at this time.
Large Language Models (LLMs) like ChatGPT have great potential to influence the way people search for information and make decisions. Those effects are already felt in many fields, so it's necessary to get a better understanding of how they work and how their linguistic competencies and decision-making heuristics compare to humans. Benchmark tests on these skills have been primarily developed in English. Due to disparities in training data in different languages, LLM performance in English may not necessarily translate to other languages. It is essential to expand the domain of evaluation beyond English. In this study we evaluate ChatGPT performance on world knowledge and commonsense reasoning tasks vis-à-vis human performance in Japanese. Our results show that ChatGPT achieves lower accuracy compared to humans. We also report differences in naturalness judgements. These results shed light on the differences between humans and LLMs and can inform future studies.
Building similarity graph...
Analyzing shared references across papers
Loading...
Reese et al. (Sat,) studied this question.
www.synapsesocial.com/papers/68e6a89ab6db64358762bd09 — DOI: https://doi.org/10.1145/3613905.3650975
May Lynn Reese
Anastasia Smirnova
San Francisco State University
Building similarity graph...
Analyzing shared references across papers
Loading...