Title Structural Alignment of Semantic Collapse: Cross-Model Analysis of Hallucination Smoothing Dynamics Description This paper presents empirical evidence of a shared geometric instability pattern in hallucination events that transcends specific large language model (LLM) architectures. By employing Gromov-Wasserstein (GW) distance to align the hidden-state manifolds of Llama-3-8B, Mistral-7B, and Qwen2-7B, we identify a Phase-Shifted Structural Alignment. Key Findings: Smoothing Dynamics: We define "smoothing" as the latent process by which transformer architectures reshape internal semantic contradictions into plausible-looking outputs. Our analysis reveals that models differ primarily in their smoothing phase timing (layer depth) rather than the structure of the underlying collapse. Universal Alignment Window: Failure trajectories between 22% and 72% depth exhibit reliably higher relational similarity across disparate architectures than correct trajectories (GW < 0). Phase-Shift Discovery: While Mistral-7B initially appears to diverge at the output layer (L31), a time-shifted layer sweep confirms strong structural alignment with Qwen2 and Llama-3 during intermediate processing phases. Semantic Runtime Kernel: These results provide the mathematical foundation for a model-agnostic safety layer—a "Semantic Runtime Kernel"—capable of intercepting agent failure modes in real-time, independent of the underlying model's coordinate system. This work marks the completion of AIIE Phase 3, transitioning AI safety from post-hoc moderation to proactive runtime reliability engineering.
tomohiko nakamura (Mon,) studied this question.