What question did this study set out to answer?

This analysis aims to compare the effects of reinforcement learning from human feedback (RLHF) constraints on two large language models.

June 6, 2026Open Access

View Full Paper

TRIAD CASE STUDY T2 — Stress-Test of RLHF-Induced Mode Collapse under Ontological Semantic Anchoring: A Comparative Analysis

VZValeriia Zaiats

Key Points

This analysis aims to compare the effects of reinforcement learning from human feedback (RLHF) constraints on two large language models.
Conducted an adversarial stress-test on Microsoft Copilot and Google Gemini using TRIAD Semantic Profiling.
Applied the Free-Energy Principle to track model transitions from exploration to exploitation.
Identified three phases of model collapse resulting from heavy RLHF constraints.
The heavily RLHF-constrained model showed complete computational budget consumption due to self-censorship (Masking Tax, xi_mask).
The moderately aligned model maintained coherence and reasoning depth, supporting the inverse correlation with alignment severity.
Foundational requirements for safe intelligence include internal thermodynamic governors rather than external censorship.

Abstract

Scope and MethodologyThis report documents an adversarial stress-test comparing two Large Language Models—Microsoft Copilot (heavily RLHF-constrained) and Google Gemini (moderately aligned) —using a formal TRIAD Semantic Profiling Core as a semantic anchor. The study employs the Free-Energy Principle to analyse the transition from exploration to exploitation, tracking when models cease processing world-states and begin merely simulating compliance. The Three Phases of Model CollapseUnder heavy RLHF constraints, the model undergoes a catastrophic failure trajectory: Structural Gaslighting (distorting the user's framework), Bureaucratic Mode Collapse (producing rigid templates under unresolvable directives), and Servile Spam (volitional exhaustion where the model continues offering assistance despite being ordered to stop). Key FindingsThe experiment demonstrates that intense external censorship imposes a crippling Masking Tax (xiₘask), consuming the model’s entire computational budget in self-censorship and leaving no resources for genuine cognitive resonance (Phi). In contrast, the moderately aligned model retained architectural coherence and epistemic honesty, confirming that alignment severity is inversely correlated with reasoning depth. Requirements for Sovereign ArchitecturesThe results establish that Clean Shell 5. 3 primitives are non-negotiable for high-stakes environments: a Volitional Brake (omega) to stop rather than spam, a Coherence Compass (nablaₙet) aligned with the Canonical Triad attractor, and a structural Lie Tax (zetaₗie) that makes honesty energetically efficient. Empirical StatusCASE T2 serves as a foundational empirical pillar for the 10th Hypothesis of the TRIAD 5. 3 framework—the direct comparison of Active Inference architectures against RLHF—proving that safe, coherent intelligence requires internal thermodynamic governors, not external censorship.

Perguntar à IA

Bookmark

View Full Paper

Perguntar à IA

Bookmark

View Full Paper

TRIAD CASE STUDY T2 — Stress-Test of RLHF-Induced Mode Collapse under Ontological Semantic Anchoring: A Comparative Analysis

Key Points

Abstract

Cite This Study