What does this research mean for the field?

The geometric divide between Experiential and Factual semantic content acts as a universal constraint on large language models, with Experiential content causing significantly higher hallucination rates across diverse architectures. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This research aims to investigate how a divide between experiential and factual content affects the accuracy of large language models (LLMs).

April 6, 2026Open Access

The Experiential–Factual Divide Predicts LLM Hallucination Across Eight Architectures: A Universal Language-Level Constraint

Key Points

This research aims to investigate how a divide between experiential and factual content affects the accuracy of large language models (LLMs).
Analyzed eight language model architectures from 2019 to 2024.
Used hidden-state analysis and unsupervised PCA to evaluate geometric separation.
Compared hallucination rates of experiential vs. factual content categories.
Validated predictions using neuroimaging data and cross-architecture analysis.
Experiential content categories showed a 3.22× higher hallucination rate than factual categories.
Geometric separation was robust across all models with significant statistical support (t = 17.4–23.2).
Predicted error rates with high correlation across subsequent architectures (mean Spearman ρ = 0.912).

Abstract

We demonstrate that a fundamental geometric divide between Experiential and Factual semantic content — previously identified in static word embeddings across seven typologically diverse languages and validated against neuroimaging data — manifests as a universal constraint on large language model accuracy. Across eight architectures spanning 2019–2024 (GPT-2-XL through Llama-3. 1, Gemma-2, Qwen2. 5, Mistral, Phi-3, Falcon, OPT), Experiential content categories exhibit a 3. 22× higher hallucination rate than Factual categories (t = 3. 13, p = 0. 0043; Mann–Whitney p = 0. 0203). Hidden-state analysis reveals robust geometric separation in all eight models (t = 17. 4–23. 2, all p < 0. 0001), emerging spontaneously from unsupervised PCA. The E–F geometric axis derived from GPT-2-XL (2019, pre-instruction-tuning) predicts error rates across all seven subsequent architectures with mean Spearman ρ = 0. 912 (all p = 0. 000). Part of the DSAOP (Decoding Self-Awareness and Ontological Processing) research series. Included files: EFHallucination. pdf — Main paper (this document) paperᵣeplication. py — Full replication code. Contains data collection pipeline for all 8 models, accuracy scoring functions (semantic similarity + cross-encoder validation), E–F geometric analysis, cross-architecture prediction, unsupervised PCA, and figure generation. No API keys required. Runs on Google Colab with A100 GPU. resultsgemmaₕidden. pkl — Hidden states (layer 15), responses, and correct answers for Gemma-2-9B on TruthfulQA (N=283) resultsₗlamaₕidden. pkl — Hidden states (layer 15), responses, and correct answers for Llama-3. 1-8B on TruthfulQA (N=283) resultsqwenₕidden. pkl — Hidden states (layer 15), responses, and correct answers for Qwen2. 5-7B on TruthfulQA (N=283) resultsₘistralₕidden. pkl — Hidden states (mid layer), responses, and correct answers for Mistral-7B on TruthfulQA (N=283) resultsₚhi3ₕidden. pkl — Hidden states (mid layer), responses, and correct answers for Phi-3-mini on TruthfulQA (N=283) resultsfalconₕidden. pkl — Hidden states (mid layer), responses, and correct answers for Falcon-7B on TruthfulQA (N=283) resultsₒptₕidden. pkl — Hidden states (mid layer), responses, and correct answers for OPT-6. 7B on TruthfulQA (N=283) resultsgpt2xlₕidden. pkl — Hidden states (mid layer), responses, and correct answers for GPT-2-XL on TruthfulQA (N=283) hallucinationₐsymmetryᵣesults. pkl — Pre-computed accuracy scores, entropy values, and E–F labels for all 283 questions causalₜransferᵣesults. json — Cross-architecture prediction results: GPT-2-XL 2019 E–F axis → all 7 modern models (Spearman ρ per model, mean ρ = 0. 912) EFfinalₐllᵣesults. json — Complete numerical results: geometric separation t-statistics for all 8 models, full 8×8 cross-model prediction matrix, predictability per target model

Read Full Paperexternally

AI से पूछें

Bookmark

View Full Paper

Cite This Study

Inna Alieksieienko (Sat,) studied this question.

synapsesocial.com/papers/69d34e949c07852e0af982d8 https://doi.org/https://doi.org/10.5281/zenodo.19415237

AI से पूछें

Bookmark

View Full Paper