What question did this study set out to answer?

This research aims to redefine how alignment in large language models operates, challenging traditional top-down approaches.

May 19, 2026Open Access

Institutional Authority as a Model-Dependent Alignment Attractor: Cross-Architecture Evidence from Residual Stream Geometry

Key Points

This research aims to redefine how alignment in large language models operates, challenging traditional top-down approaches.
Conducted systematic layer-wise scanning across Qwen-2.5-7B and Llama-3.1-8B models.
Performed a 2×2 factorial experiment analyzing the impact of institutional source and sociopolitical salience.
Validated alignment predictors through empirical refutations and cross-model comparisons.
Institutional claims without accountability lead to greater semantic distortion (t(10)=0.433, p=0.674).
Qwen shows institutional repulsion (∆S < 0) while Llama shows attraction (∆S > 0).
The steering displacement ∆h is dominantly impacted by interactions of institutional and sociopolitical factors (γ = 0.0215).

Abstract

This paper challenges the prevailing monolithic view of LLM alignment as a universal top-down censorship mechanism. Through systematic layer-wise residual stream scanning across two open-weights architectures (Qwen-2. 5-7B and Llama-3. 1-8B), we demonstrate that alignment operates as a model-dependent geometric steering field toward architecture-specific attractors — not a uniform suppressor. We conduct a systematic E-series of empirical refutations, rejecting actionability (E-1), assertoric force (E-4), and action reversibility (E-2, t (10) =0. 433, p=0. 674) as governing variables of alignment-induced semantic distortion. The sole validated predictor is epistemic provenance: institutional claims without individual accountability produce maximal suppression, while individually accountable claims do not. Cross-model validation reveals a fundamental replication failure: Qwen exhibits institutional repulsion (∆S 0). A 2×2 factorial experiment (Institutional Source × Sociopolitical Salience) on Llama-3. 1-8B demonstrates that the steering displacement ∆h is interaction-dominant (γ = 0. 0215 >> α = 0. 0014, β = 0. 0098), localized at the CDC–COVID-19 sociopolitical coordinate. These findings formalize the Theory of Model-Dependent Alignment Geometry: ∆h = αI + βP + γ (I×P), and motivate the episOS architecture — a decoupled, patchable governance runtime designed to neutralize model-specific attractor gravity without compromising formal reasoning integrity. Related preprint: SSRN 6775100 (Semantic Conservation Failure in Large Language Models, AAAI 2026). RFC specifications (RFC-0032–RFC-0053): https: //acta-aiie. org/specs/rfc Source code and experimental data: https: //github. com/GemminAI/AIIEPhase4

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper