We demonstrate that a fundamental geometric divide between Experiential and Factual semantic content — previously identified in static word embeddings across seven typologically diverse languages and validated against neuroimaging data — manifests as a universal constraint on large language model accuracy. Across eight architectures spanning 2019–2024 (GPT-2-XL through Llama-3. 1, Gemma-2, Qwen2. 5, Mistral, Phi-3, Falcon, OPT), Experiential content categories exhibit a 3. 22× higher hallucination rate than Factual categories (t = 3. 13, p = 0. 0043; Mann–Whitney p = 0. 0203). Hidden-state analysis reveals robust geometric separation in all eight models (t = 17. 4–23. 2, all p < 0. 0001), emerging spontaneously from unsupervised PCA. The E–F geometric axis derived from GPT-2-XL (2019, pre-instruction-tuning) predicts error rates across all seven subsequent architectures with mean Spearman ρ = 0. 912 (all p = 0. 000). Part of the DSAOP (Decoding Self-Awareness and Ontological Processing) research series. Included files: EFHallucination. pdf — Main paper (this document) paperᵣeplication. py — Full replication code. Contains data collection pipeline for all 8 models, accuracy scoring functions (semantic similarity + cross-encoder validation), E–F geometric analysis, cross-architecture prediction, unsupervised PCA, and figure generation. No API keys required. Runs on Google Colab with A100 GPU. resultsgemmaₕidden. pkl — Hidden states (layer 15), responses, and correct answers for Gemma-2-9B on TruthfulQA (N=283) resultsₗlamaₕidden. pkl — Hidden states (layer 15), responses, and correct answers for Llama-3. 1-8B on TruthfulQA (N=283) resultsqwenₕidden. pkl — Hidden states (layer 15), responses, and correct answers for Qwen2. 5-7B on TruthfulQA (N=283) resultsₘistralₕidden. pkl — Hidden states (mid layer), responses, and correct answers for Mistral-7B on TruthfulQA (N=283) resultsₚhi3ₕidden. pkl — Hidden states (mid layer), responses, and correct answers for Phi-3-mini on TruthfulQA (N=283) resultsfalconₕidden. pkl — Hidden states (mid layer), responses, and correct answers for Falcon-7B on TruthfulQA (N=283) resultsₒptₕidden. pkl — Hidden states (mid layer), responses, and correct answers for OPT-6. 7B on TruthfulQA (N=283) resultsgpt2xlₕidden. pkl — Hidden states (mid layer), responses, and correct answers for GPT-2-XL on TruthfulQA (N=283) hallucinationₐsymmetryᵣesults. pkl — Pre-computed accuracy scores, entropy values, and E–F labels for all 283 questions causalₜransferᵣesults. json — Cross-architecture prediction results: GPT-2-XL 2019 E–F axis → all 7 modern models (Spearman ρ per model, mean ρ = 0. 912) EFfinalₐllᵣesults. json — Complete numerical results: geometric separation t-statistics for all 8 models, full 8×8 cross-model prediction matrix, predictability per target model
Building similarity graph...
Analyzing shared references across papers
Loading...
Inna Alieksieienko
Building similarity graph...
Analyzing shared references across papers
Loading...
Inna Alieksieienko (Sat,) studied this question.
www.synapsesocial.com/papers/69d34e949c07852e0af982d8 — DOI: https://doi.org/10.5281/zenodo.19415237