What question did this study set out to answer?

This research aims to address metacognitive deficits in large language models by implementing predictive metacognition.

May 28, 2026Open Access

Predictive metacognition: a neuro-computational framework for self-monitoring in large language models

Key Points

This research aims to address metacognitive deficits in large language models by implementing predictive metacognition.
Integrated predictive processing principles into transformer architectures.
Fine-tuned Llama-3-8B-Instruct and Phi-3-Mini-4k-Instruct models using LoRA on 4,000 examples.
Evaluated metacognitive models against GPT-4o and Claude-3.5-Sonnet for calibration improvements.
Metacognitive models significantly reduced Brier Score by 11.6% and 17.2%.
Achieved statistically significant improvement in Expected Calibration Error (p < 0.023).
Models maintained competitive task accuracy while generalizing to out-of-domain tasks.

Abstract

Abstract Large Language Models demonstrate remarkable capabilities but suffer from critical metacognitive deficits, manifesting as overconfidence and hallucination, which severely limit their deployment in high-stakes applications. We introduce Predictive Metacognition, a neurobiologically-inspired framework that integrates principles of predictive processing and anterior cingulate cortex monitoring into transformer architectures. Our approach implements Error-Driven Learning and Dual-Process Monitoring through specialised fine-tuning that trains models to simultaneously generate responses and assess their own performance reliability. We fine-tuned Llama-3-8B-Instruct and Phi-3-Mini-4k-Instruct using LoRA (rank=8, =16) on 4, 000 strategically constructed examples spanning varying confidence levels. Comprehensive evaluation against state-of-the-art baselines, including GPT-4o and Claude-3. 5-Sonnet, revealed statistically significant improvements in confidence calibration. Our metacognitive models achieved substantial reductions in Brier Score (11. 6% and 17. 2% respectively) and Expected Calibration Error (p < 0. 023, Cohen’s d = 1. 456). Critically, these improvements generalised robustly to out-of-domain tasks while maintaining competitive task accuracy. This work establishes a computationally tractable implementation of biologically-inspired metacognitive architecture for large language models, offering a principled pathway towards AI systems capable of reliable intrinsic self-monitoring that can more accurately assess their own knowledge boundaries and express appropriate uncertainty.

Bookmark

View Full Paper

Bookmark

View Full Paper

Predictive metacognition: a neuro-computational framework for self-monitoring in large language models

Key Points

Abstract

Cite This Study