April 22, 2026Open Access

Evaluating large language models for accuracy incentivizes hallucinations

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Abstract Large language models sometimes produce confident, plausible falsehoods (‘hallucinations’), limiting their reliability 1,2 . Previous work has offered numerous explanations and effective mitigations such as retrieval and tool use 3 , consistency-based self-verification 4 and reinforcement learning from human feedback 5 . Nonetheless, the problem persists even in state-of-the-art language models 6,7 . Here we show how next-word prediction and accuracy-based evaluations inadvertently reward unwarranted guessing. Initially, next-word pretraining creates statistical pressure towards hallucination even with idealized error-free data: using learning theory 8,9 , we show that facts lacking repeated support in training data (such as one-off details) yield unavoidable errors, whereas recurring regularities (such as grammar) do not. Subsequent training stages aim to correct such errors. However, dominant headline metrics such as accuracy systematically reward guessing over admitting uncertainty. To align incentives, we suggest two additions to the classic approach of adding error penalties to evaluations to control abstention 10,11 . First, we propose ‘open rubric’ evaluations that explicitly state how errors are penalized (if at all), which test whether a model modulates its abstentions to stated stakes while optimizing accuracy. Second, as hallucination-specific benchmarks rarely make leaderboards 12 , we suggest using open-rubric variants of existing evaluations, to reverse their guessing incentives. Reframing hallucination as an incentive problem opens a practical path towards more reliable language models.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Kalai et al. (Wed,) studied this question.

synapsesocial.com/papers/6a0fc6f190ecb39bf65fb003 https://doi.org/https://doi.org/10.1038/s41586-026-10549-w

Me gusta

Guardar

Ver artículo completo