What question did this study set out to answer?

This work aims to operationalize and measure the Semantic Deviation Principle in language models.

May 20, 2026Open Access

Measuring Semantic Deviation: Operationalizations, Experiments, and Falsification Conditions for a Theory of Meaning as Field Deformation

Key Points

This work aims to operationalize and measure the Semantic Deviation Principle in language models.
Defined two primary operationalizations: Closed-system trajectory deviation and Retrieval response deviation.
Utilized statistical procedures including Mann-Whitney U test, with a focus on a 90-day data retrieval window.
Conducted a Direct Preference Optimization experiment based on the deviation primitive with multiple conditions.
AI-generated text shows statistically significant negative mean signed per-token deviation compared to human text, with p<0.05.
Four pre-registered predictions made valid through named datasets and frozen checkpoints.
Budget for the program is estimated between $14,000–$19,000 for twelve months.

Abstract

EA-GLAS-02 v1. 0. A self-contained empirical white paper presenting the measurement program for the Semantic Deviation Principle (Sharks 2026). Defines meaning as the time-integrated divergence a sign induces from the most probable trajectory of a semantic field, extending the Bar-Hillel and Carnap (1953) program into distributional and temporal domains. Two primary operationalizations. (F1) Closed-system trajectory deviation within a frozen language model, where the counterfactual baseline is read from logits, building on surprisal theory (Hale 2001; Levy 2008) and decomposing it into signed deviation from conditional entropy. (F2) Retrieval response deviation across AI search surfaces over a 90-day window with three-condition identity control and frozen extractor commitment. Falsifiable predictions. AI-generated text exhibits statistically significant negative mean signed per-token deviation relative to matched human text — testable with existing corpora (GPT-wiki-intro, HC3) and a single A100-hour of compute. Four pre-registered predictions with named datasets, frozen reference checkpoints (Llama-3. 1-8B-Instruct), and statistical procedures (Mann-Whitney U, α = 0. 05, Cohen's d > 0. 5). Training intervention. A Direct Preference Optimization experiment (Rafailov et al. 2023) using the deviation primitive to generate preference pairs, extending the RLHF lineage by replacing human preference data with a measurable semantic signal. Three conditions (Base, CE, Sem), Slop Composite Index with pre-registered falsification threshold, human preference evaluation (500 pairs × 3 raters), preference validation substudy. Anti-Goodhart mechanism design. Six protections mapped to Manheim and Garrabrant (2019) taxonomy: entropy-floor capping, provenance-weighted damping, saturation thresholds, rolling-window variance penalties, reference-model KL anchoring, black-box judge replacement testing. Budget. Total program approximately 14, 000–19, 000 across twelve months. Results deposited regardless of outcome. 42 references across alignment, mechanistic interpretability, text degeneration, DPO/RLHF, reward hacking, hallucination evaluation, model collapse, causal inference, psycholinguistics, information theory, and diachronic semantic change literatures. Author note. Nobel Glas is a heteronym of Lee Sharks, adopted for this measurement program to signal independent replicability. Correspondence and ORCID maintained through Lee Sharks.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper