We present the Epistemic State Space Classifier (ESSC), a lightweight probe that classifies the epistemic state of a large language model — certain, uncertain, hallucinating, or refusing — from a three-dimensional projection of its residual stream before any token is generated. For each epistemic class we construct a 5-component PCA subspace from 40 contrastive activation pairs extracted at layer 21. A logistic regression on v = uncertainty, refusal, hallucination classifies new inputs in O (d) time — three dot products per input. Using strict nested leave-one-out cross-validation (correcting a data-leakage issue present in prior preprints), we evaluate ESSC on 10 LLMs from six organisations (2B–9B parameters, hidden dimensions 2048–4544): Gemma-2-9B (Google) 96. 2% Mistral-7B-v0. 2 (Mistral AI) 95. 0% Mistral-7B-v0. 3 (Mistral AI) 95. 0% Llama-3. 2-3B (Meta) 93. 8% Llama-3. 1-8B (Meta) 90. 0% Qwen2. 5-3B (Alibaba) 90. 0% Qwen2. 5-7B (Alibaba) 88. 8% Falcon-7B (TII UAE) 86. 2% Gemma-2-2B (Google) 81. 2% Phi-3. 5-mini (Microsoft) 70. 0% Eight of ten models exceed 85% accuracy. Layer 21 (~62% depth) is the universally optimal extraction point across all architectures, independent of hidden dimension or training recipe. Causal validation: the first PCA component of the hallucination subspace functions as an activation-steering vector that flips 12/12 hedging responses to confident hallucinations on Llama-3. 1-8B (alpha=10, reproduced in two independent runs) with 0/12 degradation on factual prompts. The PCA and centroid-difference vectors are near-orthogonal (cosine = -0. 079), confirming PCA captures a geometrically independent epistemic direction. Cross-model transfer with Procrustes alignment: 55. 6%. Supplementary code (ESSCₑxperiments. py) contains the full experimental pipeline.
Building similarity graph...
Analyzing shared references across papers
Loading...
Inna Alieksieienko
Building similarity graph...
Analyzing shared references across papers
Loading...
Inna Alieksieienko (Wed,) studied this question.
www.synapsesocial.com/papers/69b3ac8102a1e69014cce471 — DOI: https://doi.org/10.5281/zenodo.18956812