We demonstrate that epistemic uncertainty — a model's internal state of "not knowing" — has a measurable geometric address in the residual stream of large language models, detectable before any output token is generated. Using PCA-based subspace analysis across 10 models from 7 independent organizations, we show that projection scores onto an uncertainty subspace significantly discriminate epistemically uncertain from factually certain prompts (all p < 0.05, most p < 0.001). Two control conditions rule out lexical and topic confounds. **Key Findings** - 10/10 models replicated across 7 organizations (Meta, Google, Mistral AI, Alibaba, TII UAE, Allen AI, Microsoft) - Signal is present **BEFORE** generation — extracted at last prompt token - Two depth clusters correlated with training recipe: - Standard RLHF models: peak at ~62% layer depth - Non-standard training (Gemma-2-2B, Qwen2.5-3B): peak at ~86% depth - Same two exception models previously identified in refusal geometry study (Alieksieienko 2025), suggesting training recipe determines localization of epistemic processing - Three epistemic subspaces (refusal, hallucination, uncertainty) form structured geometry significantly below random Grassmann baseline This work extends the DSAOP framework (Alieksieienko, 2025) and suggests transformers maintain a low-dimensional epistemic state space that can be measured and potentially controlled. Replication code included. All experiments run on Google Colab A100/L4 GPU using Llama 3.1 8B Instruct (4-bit NF4) and 9 additional models. Research conducted in collaboration with AI assistant (Anthropic Claude).
Building similarity graph...
Analyzing shared references across papers
Loading...
Inna Alieksieienko
Building similarity graph...
Analyzing shared references across papers
Loading...
Inna Alieksieienko (Mon,) studied this question.
www.synapsesocial.com/papers/69b257bf96eeacc4fcec6b7e — DOI: https://doi.org/10.5281/zenodo.18927405