November 24, 2025Open Access

Human tests for machine models: What lies “Beyond the Imitation Game”?

Key Points

Key points are not available for this paper at this time.

Abstract

Abstract Benchmarking large language models (LLMs) is a key practice for evaluating their capabilities and risks. This paper considers the development of “BIG Bench,” a crowdsourced benchmark designed to test LLMs “Beyond the Imitation Game.” Drawing on linguistic anthropological and ethnographic analysis of the project's GitHub repository, we examine how contributors developed tasks based on their lay understandings of language, cognition, and intelligence. By tracing how contributors make implicit judgments about what constitutes a meaningful test of intelligence, we show how widespread language ideologies shape the evaluation of LLMs and the imaginaries that guide their development.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper

Cite This Study

Anna Weichselbraun (Mon,) studied this question.

synapsesocial.com/papers/69403ba12d562116f290cb91 https://doi.org/https://doi.org/10.1111/jola.70035

AI에게 질문

Bookmark

View Full Paper