January 2, 2026Open Access

LLMs achieve adult human performance on higher-order theory of mind tasks

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

). This paper builds on prior work by introducing a handwritten test suite-Multi-Order Theory of Mind Q&A-and using it to compare the performance of five LLMs of varying sizes and training paradigms to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on our ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for higher-order ToM performance, and that the linguistic abilities of large models may support more complex ToM inferences. Given the important role that higher-order ToM plays in group social interaction and relationships, these findings have significant implications for the development of a broad range of social, educational and assistive LLM applications.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Street et al. (Fri,) studied this question.

synapsesocial.com/papers/6a07a0be396fe5b3a88b38e1 https://doi.org/https://doi.org/10.3389/fnhum.2025.1633272

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Me gusta

Guardar

Ver artículo completo