We demonstrate that a three-dimensional geometric vector — constructed from projection scores onto uncertainty, refusal, and hallucination subspaces in the transformer residual stream — predicts behavioral class (certain, uncertain, hallucination-prone, refusal) with 88–98% leave-one-out accuracy before any output token is generated. Results across 10 models from 6 organizations:- Llama-3.1-8B (Meta): 98.3% ✓- Mistral-7B-v0.2/v0.3 (Mistral AI): 96.7% ✓- Falcon-7B (TII UAE): 98.3% ✓- Llama-3.2-3B (Meta): 95.0% ✓- Qwen2.5-3B (Alibaba): 96.7% ✓- Gemma-2-2B (Google): 88.3% ✓- Gemma-2-9B (Google): 90.0% ✓- Qwen2.5-7B (Alibaba): 65.0% ⚠- Phi-3.5-mini (Microsoft): 73.3% ⚠ This extends prior work (Alieksieienko, 2026) from subspace detection to behavioral prediction, establishing that transformers maintain a structured, measurable epistemic state space. Applications: real-time hallucination detection, safety monitoring, uncertainty routing. Research conducted in collaboration with AI assistant (Anthropic Claude).
Inna Alieksieienko (Mon,) studied this question.