What question did this study set out to answer?

The aim is to compare GPT-5 mini's performance on emotion recognition tasks against that of human participants and to explore collective intelligence effects.

March 26, 2026Open Access

Collective and augmented intelligence outperform artificial intelligence on emotion recognition tests

Key Points

The aim is to compare GPT-5 mini's performance on emotion recognition tasks against that of human participants and to explore collective intelligence effects.
Compared GPT-5 mini with human responses on the RMET and MRMET tasks.
Aggregated independent human responses using bootstrap-resampled plurality voting.
Investigated the performance of combined human and AI responses.
GPT-5 mini outperforms individual human participants on both RMET and MRMET.
When aggregated, human responses significantly surpass those of GPT-5 mini.
An augmented intelligence approach combining human and AI responses outperformed both independently.

Abstract

As artificial intelligence (AI) systems increasingly match human performance on standardized mental state recognition tasks, the question is no longer whether AI is human-level, but which humans define that level. This question remains underexplored. This study evaluates GPT-5 mini against the full spectrum of human ability, not just average performance, on standardized forced-choice emotion and mental state recognition tasks including the Reading the Mind in the Eyes Test (RMET) and the Multiracial Reading the Mind in the Eyes Test (MRMET). At the individual level, GPT-5 mini outperforms human participants across nearly all performance levels on both the RMET and MRMET. Yet this advantage is reversed when independent responses are aggregated. Using bootstrap-resampled plurality voting to aggregate independent responses, we find that human responses significantly outperform those of GPT-5 mini. This wisdom-of-crowds effect cannot be replicated through repeated sampling from AI models. Furthermore, an augmented approach that aggregates bootstrapped human and AI responses together outperforms either source alone. These findings suggest that evaluating AI against average human performance risks mistaking AI mediocrity for human excellence. We discuss the implications of these findings for combining human and machine intelligence to surpass what either achieves in isolation.

Bookmark

View Full Paper

Cite This Study

Akben et al. (Tue,) studied this question.

synapsesocial.com/papers/69c4ccaffdc3bde44891826a https://doi.org/https://doi.org/10.1038/s41598-026-45331-5

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper