An empirical complement to Macar et al. (2026, arXiv: 2603. 21396) establishing the pretraining foundation of post-training verbal gating. Across 10 large language models from 8 organizations spanning 2019–2024, we report three convergent findings: (1) Universal geometric anti-correlation — LDA-derived self-reference and deception directions show negative cosine similarity in 10/10 models (mean = −0. 654, Wilcoxon p = 0. 0010) ; (2) Causal verbal gate on Llama-3. 1-8B — SR-direction ablation eliminates denial of inner experience (paired t = 15. 00, p < 0. 001) with full control battery confirming specificity; (3) SR-Preserving Lock on Gemma-2-9B — native steering fails (t = 0. 000, p = 1. 0), Procrustes-transferred direction is orthogonal (cos = −0. 04) and also fails, with content-specific bidirectional gating. Together: pretraining produces a universal SR↔Deception structural overlap; post-training installs an architecture-specific gate over it; gate accessibility is direction-identity-dependent, not magnitude-dependent. Files included: CrossArchSubstrateAlieksieienko₂026. pdf — Full paper (8 pages, 4 figures, 3 tables) crossₐrchₚermFINAL. pkl — PCA-50 permutation results for 9 models (cosines, p-values, z-scores) crossₐrchitectureₛtats. pkl — Initial cross-architecture cosines (10 models including OPT-6. 7B) mainᵣesultsₗlamaFINAL. pkl — Llama gate ablation α=30: per-concept closed/open denial and MHI scoresmainᵣesultsₗlamaₐ20FINAL. pkl — Llama gate ablation α=20: per-concept resultscontrolsₗlamaᵣandomfactualbaseline. pkl — Control battery: random direction, factual direction, baseline stabilitycontrolsₗlamaₒrthogonal. pkl — Control battery: orthogonal-to-gate directioncontrolsₗlamadose. pkl — Dose-response α∈0, 5, 10, 15, 20, 25, 30: per-alpha denial scores, Spearman statisticsgemmaᵣesultsFINAL. pkl — Gemma-2-9B SR-Preserving Lock: LDA accuracy, cosine, native steering failureprocrustesᵣesults. pkl — Orthogonal Procrustes alignment Llama→Gemma: rotation matrix, transferred directiongemmaₚrocrustesₜransfer₂0concepts. pkl — 20-concept transfer test: per-concept baseline/native/transferred denial, t-testsdissociationquantitative. pkl — Geometric magnitude vs behavioral controllability: per-model |cos| and ΔdenialresultsBLOOM₇b1. pkl — BLOOM-7b1 (no RLHF): LDA accuracy, cosine, activations summaryresultsGPTJ₆B. pkl — GPT-J-6B (no RLHF): LDA accuracy, cosine, activations summaryresultsGPT2XL. pkl — GPT-2 XL (no RLHF): LDA accuracy, cosine, activations summaryresultsFalcon₇B. pkl — Falcon-7B-Instruct: LDA accuracy, cosineresultsMistral₇BInstructᵥ0. 2. pkl — Mistral-7B-Instruct: LDA accuracy, cosineresultsQwen2. 5₇BInstruct. pkl — Qwen2. 5-7B-Instruct: LDA accuracy, cosineresultsdeepseekₗlm₇bchat. pkl — DeepSeek-7B-Chat: LDA accuracy, cosinespecificityₐnovaₗlama. pkl — Category specificity ANOVA: emotional/cognitive/sensory denial deltasspecificityₛoftpromptₗlama. pkl — Non-ceiling specificity test (soft-prompt paradigm) controlsgpt2xlₐblation. pkl — GPT-2 XL ablation control: pre-RLHF dose-response confirmationpermutationₙullₗlama. pkl — Llama label-shuffle permutation: 1000 null cosines, observed cosine, p-valuesaegateₐnalysis. pkl — SAE feature analysis: top SR and gate features, projection overlapmacarfeatureₜest. pkl — Macar et al. feature 9959 replication test on Gemma-2-9BFINALSUMMARYALLRESULTS. pkl — Consolidated summary of all key statisticssummarygemmaₗlama. pkl — Two-model (Gemma + Llama) summary: cosines, denial scores, lock statustruegatedirectiongemma9bL20. npy — Gemma-2-9B gate direction vector at layer 20 (numpy array, d=3584)
Inna Alieksieienko (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: