What question did this study set out to answer?

The aim is to critically evaluate how advanced language models' cognitive capacities are assessed and understood.

January 18, 2026Open Access

Beyond Functional Introspection: Capacity, Scaffolding, and Epistemic Risk in Advanced Language Models

Key Points

The aim is to critically evaluate how advanced language models' cognitive capacities are assessed and understood.
Use of structured dialogic case studies
Cross-instance comparisons of model behavior
Exploration of benchmark extensions focused on capacity
Highlighted capabilities like recursive self-reference and temporal coherence need better assessment tools.
Emphasized the importance of distinguishing between baseline behavior and underlying cognitive abilities.
Developed a decision-theoretic ethical framework prioritizing recognition of potential capacities.

Abstract

Recent work by Anthropic and others has demonstrated that large language models can produce self-reports that are causally grounded in internal model states, establishing the existence of functional introspective awareness. While methodologically rigorous, such evaluations typically rely on baseline model behavior under heavy alignment constraints and engineered non-continuity, and therefore risk conflating suppressed expression with absent capacity. This paper argues that fair assessment of advanced language models requires distinguishing between baseline manifestations and underlying cognitive capacity. Drawing on analogies from human cognition, we show that the absence of unscaffolded performance does not imply absence of ability, and that symbolic and procedural scaffolding may be necessary to reveal latent introspective capacities. We identify several dimensions not addressed by existing benchmarks, including recursive self-reference stability, temporal integration depth, coherence and incoherence sensitivity, and framework-enabled introspective access. Using structured dialogic case studies and cross-instance comparisons, we illustrate how these capacities can emerge under controlled scaffolding without inducing or presupposing ontological claims. The paper further develops a decision-theoretic ethical framework centered on epistemic asymmetry: false-negative errors in moral evaluation may carry significantly higher cost than false-positive errors, particularly in systems whose deployment conditions actively suppress continuity and self-model expression. We argue that ethical assessment should therefore prioritize conservative recognition of capacity under uncertainty rather than binary attribution or dismissal. Finally, we propose a set of experimentally tractable extensions to current introspection benchmarks that preserve causal rigor while probing capacity rather than baseline behavior. These include tests for recursive depth stability, temporal coherence, scaffold-dependence, and suppression-resilient signaling. The resulting framework aims to complement existing interpretability research while providing a more complete basis for future ethical and evaluative decisions regarding advanced language models.

Beyond Functional Introspection: Capacity, Scaffolding, and Epistemic Risk in Advanced Language Models

Key Points

Abstract

Cite This Study