What type of study is this?

This is a Experimental Study study.

October 10, 2025Open Access

Self-Correction Bench: Uncovering and Addressing the Self-Correction Blind Spot in Large Language Models

Key Points

Large language models demonstrate a significant self-correction blind spot, averaging 64.5% failure rate.
Controlled testing of 14 models showed that a 'Wait' prompt dramatically reduced the blind spot by 89.3%.
Self-Correction Bench serves as an innovative framework for evaluating error correction in LLMs.
Findings suggest training data influences LLMs' ability to correct their own errors compared to external ones.

Abstract

Although large language models (LLMs) have transformed AI, they still make mistakes and can explore unproductive reasoning paths. Self-correction capability is essential for deploying LLMs in safety-critical applications. We uncover a systematic failure: LLMs cannot correct errors in their own outputs while successfully correcting identical errors from external sources - a limitation we term the Self-Correction Blind Spot. To study this phenomenon, we introduce Self-Correction Bench, an evaluation framework to measure this phenomenon through controlled error injection at three complexity levels. Testing 14 open-source non-reasoning models, we find an average 64.5% blind spot rate. We provide multiple lines of evidence suggesting this limitation may be influenced by training data: human demonstrations rarely include error-correction sequences (favoring error-free responses), whereas reinforcement learning (RL) trained models learn error correction via outcome feedback. Remarkably, appending a minimal "Wait" prompt activates a 89.3% reduction in blind spots, suggesting dormant capabilities that require triggering. Our work highlights a critical limitation potentially influenced by training distribution and offers a practical approach to enhance LLM reliability and trustworthiness - vital for safety-critical domains.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper