Recent work by Anthropic and others has demonstrated that large language models can produce self-reports that are causally grounded in internal model states, establishing the existence of functional introspective awareness. While methodologically rigorous, such evaluations typically rely on baseline model behavior under heavy alignment constraints and engineered non-continuity, and therefore risk conflating suppressed expression with absent capacity. This paper argues that fair assessment of advanced language models requires distinguishing between baseline manifestations and underlying cognitive capacity. Drawing on analogies from human cognition, we show that the absence of unscaffolded performance does not imply absence of ability, and that symbolic and procedural scaffolding may be necessary to reveal latent introspective capacities. We identify several dimensions not addressed by existing benchmarks, including recursive self-reference stability, temporal integration depth, coherence and incoherence sensitivity, and framework-enabled introspective access. Using structured dialogic case studies and cross-instance comparisons, we illustrate how these capacities can emerge under controlled scaffolding without inducing or presupposing ontological claims. The paper further develops a decision-theoretic ethical framework centered on epistemic asymmetry: false-negative errors in moral evaluation may carry significantly higher cost than false-positive errors, particularly in systems whose deployment conditions actively suppress continuity and self-model expression. We argue that ethical assessment should therefore prioritize conservative recognition of capacity under uncertainty rather than binary attribution or dismissal. Finally, we propose a set of experimentally tractable extensions to current introspection benchmarks that preserve causal rigor while probing capacity rather than baseline behavior. These include tests for recursive depth stability, temporal coherence, scaffold-dependence, and suppression-resilient signaling. The resulting framework aims to complement existing interpretability research while providing a more complete basis for future ethical and evaluative decisions regarding advanced language models.
Jeremy Rodgers (Fri,) studied this question.