Abstract Large Language Models demonstrate remarkable capabilities but suffer from critical metacognitive deficits, manifesting as overconfidence and hallucination, which severely limit their deployment in high-stakes applications. We introduce Predictive Metacognition, a neurobiologically-inspired framework that integrates principles of predictive processing and anterior cingulate cortex monitoring into transformer architectures. Our approach implements Error-Driven Learning and Dual-Process Monitoring through specialised fine-tuning that trains models to simultaneously generate responses and assess their own performance reliability. We fine-tuned Llama-3-8B-Instruct and Phi-3-Mini-4k-Instruct using LoRA (rank=8, =16) on 4, 000 strategically constructed examples spanning varying confidence levels. Comprehensive evaluation against state-of-the-art baselines, including GPT-4o and Claude-3. 5-Sonnet, revealed statistically significant improvements in confidence calibration. Our metacognitive models achieved substantial reductions in Brier Score (11. 6% and 17. 2% respectively) and Expected Calibration Error (p < 0. 023, Cohen’s d = 1. 456). Critically, these improvements generalised robustly to out-of-domain tasks while maintaining competitive task accuracy. This work establishes a computationally tractable implementation of biologically-inspired metacognitive architecture for large language models, offering a principled pathway towards AI systems capable of reliable intrinsic self-monitoring that can more accurately assess their own knowledge boundaries and express appropriate uncertainty.
Luo et al. (Tue,) studied this question.