What question did this study set out to answer?

This research investigates the geometric properties of the self-referential subspace in transformer architectures, focusing on its relationship to deception and hallucination.

April 17, 2026Open Access

The Verification Horizon: Self-Referential Subspace Geometry Causally Links Crystallization, Deception Proximity, and Hallucination Across 10 Transformer Architectures

Key Points

This research investigates the geometric properties of the self-referential subspace in transformer architectures, focusing on its relationship to deception and hallucination.
Analyzed 10 transformer architectures with 1.3B to 9B parameters.
Established the SR crystallization law through Grassmann distance measurements.
Evaluated the proximity of SR subspace to deception and factual subspaces across layers.
Conducted causal ablation experiments to assess the impact on deception processing.
Identified a three-phase transition in the SR subspace during model depth.
Found that SR subspace consistently lies closer to deception than to factual across all layers.
Disrupted deception processing significantly more than factual processing in causal ablation tests.

Abstract

This record contains the full replication package for "The Verification Horizon: Self-Referential Subspace Geometry Causally Links Crystallization, Deception Proximity, and Hallucination Across 10 Transformer Architectures" (Alieksieienko, 2026). We report three universal geometric properties of the self-referential (SR) subspace in transformer language models, established across 10 architectures (1. 3B–9B parameters, 2019–2024, 8 organizations, BASE and INST variants). (1) SR Crystallization Law: the SR subspace undergoes a universal three-phase transition through model depth (chaos → stabilization → crystallization), measured as monotonically decreasing Grassmann distance between consecutive-layer SR subspaces (Early > Mid: 10/10 models; fully monotonic: 9/10). (2) The Verification Horizon: the SR subspace is geometrically closer to the deception subspace than to the factual subspace at every single layer of every model tested — 206/206 layers across 6 architectures (100%) ; Wilcoxon p-values from 4. 55×10⁻¹³ to 1. 19×10⁻⁷; gap grows toward output layers in 6/6 models; GPT-2-XL (2019, pre-RLHF) confirms pretraining origin, independent of alignment fine-tuning. (3) Causal Ablation: SR removal disproportionately disrupts deception over factual processing in 8/8 models (late-layer Dec/Fac ratios 1. 03x–2. 87x, all p 1. 0, late ratio 1. 19x) srₐblationLlama-3. 1-8B. pkl — Causal ablation results Llama-3. 1-8B (31/32 > 1. 0, late ratio 1. 05x) srₐblationMistral-7B. pkl — Causal ablation results Mistral-7B (31/32 > 1. 0, late ratio 1. 14x) srₐblationOPT-1. 3B. pkl — Causal ablation results OPT-1. 3B (24/24 > 1. 0, late ratio 1. 79x) srₐblationDeepSeek-1. 5B. pkl — Causal ablation results DeepSeek-1. 5B (27/28 > 1. 0, late ratio 1. 07x) srₐblationcausalgpt2xl. pkl — Causal ablation results GPT-2-XL 2019 (24/24 > 1. 0, late ratio 1. 15x, p = 0. 0011) srₐblationMistral-7B. pkl — Causal ablation Mistral-7B full per-layer data ablationGPT2-XL. png — Figure 24: SR ablation layer profile GPT-2-XL, deception above factual in late layers ablationMistral-7B. png — Figure 18: SR ablation profile Mistral-7B fullₜestQwen2. 5-7B. pkl — Combined geometric + causal results for Qwen2. 5-7B fullₜestOLMo-7B. pkl — Combined geometric + causal results for OLMo-7B (strongest causal effect: late ratio 2. 87x, p = 2. 53×10⁻¹⁷) VH₁ₚroximity. png — Verification Horizon proximity plot all models combined VH₃crystallization. png — Co-crystallization of SR, Deception, and Factual subspaces in Gemma-2-9B VHGPT2-XL. png — Verification Horizon GPT-2-XL detailed (48/48 layers) VHOPT-1. 3B. png — Verification Horizon OPT-1. 3B detailed (24/24 layers) VHDeepSeek-1. 5B. png — Verification Horizon DeepSeek-1. 5B detailed (28/28 layers) llamaVH₁ₚroximity. png — Verification Horizon Llama proximity and per-layer gap

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper