What question did this study set out to answer?

The study aims to understand why experts either correct or accept algorithm-generated scores in grading contexts.

June 11, 2026Open Access

Why do experts miss AI’s errors? Evidence from a randomized labeling experiment

Key Points

The study aims to understand why experts either correct or accept algorithm-generated scores in grading contexts.
Preregistered randomized experiment with education experts reviewing student work.
Scores labeled as human- or AI-generated, with variations in strictness (harsh/lenient).
Mediation analysis to explore factors influencing expert corrections.
Under harsh AI-generated scores, grading fairness gap increased by 22%.
In lenient cases, the fairness gap was similar regardless of label (AI/human).
Perceived ability and responsibility of AI significantly influenced corrections in harsh scenarios.

Abstract

Abstract As organizations increasingly rely on algorithmic decision aids, human oversight is vital to prevent automated errors from spreading. But what makes experts correct an algorithm or let it stand? In a preregistered randomized experiment, education experts reviewed identical student work paired with an intentionally inaccurate score labeled as either human- or AI-generated. We independently varied whether the score was too harsh or too lenient. The outcome—the grading fairness gap—measures the distance between the expert’s revised mark and the objective rating. Under a harsh recommendation, the gap was 22% larger when the score was labeled as AI-generated; in the lenient case, the fairness gap under AI and human labels was statistically indistinguishable. Mediation analysis reveals that higher perceived ability and responsibility of the algorithm in the harsh scenario explain over half of the effect, while weaker attributions in the lenient case lead to stricter corrections. Thus, deference to AI depends not on automation itself but on the direction of its errors and the credibility it signals—offering design insights for accountable human–AI collaboration.

Why do experts miss AI’s errors? Evidence from a randomized labeling experiment

Key Points

Abstract

Cite This Study