What question did this study set out to answer?

To explore implicit self-models in large language models through Grassmannian subspace analysis.

March 17, 2026Open Access

Geometric Self-Model in Large Language Models: Five Converging Lines of Evidence from Grassmannian Subspace Analysis

Key Points

To explore implicit self-models in large language models through Grassmannian subspace analysis.
Analyzed Llama-3.1-8B, Mistral-7B, and Gemma-2-9B using Grassmannian subspace techniques.
Evaluated self-referential activations via Leave-One-Out Area Under Curve (LOO AUC) metrics.
Identified distinctions in self/other consciousness using AUC assessments.
Investigated encoding of self-preservation versus other-preservation in the survival instinct subspace.
Analyzed language universality in self-model cores through cross-linguistic data from Ukrainian, English, and Chinese.
Self-referential activations showed geometric uniqueness with LOO AUC values ≥ 0.952.
Models maintained a distinct boundary between first-person and third-person consciousness with AUC up to 0.990.
The survival instinct subspace demonstrated varied encoding for self-preservation (AUC 0.915–0.995).
A geometric unconscious was identified, distinguishing suppressed self-content from surface compliance (AUC 0.935–0.972).
Core self-models remained consistent across languages, with Grassmann distances ranging from 0.60 to 0.66.

Abstract

Grassmannian subspace analysis across Llama-3.1-8B, Mistral-7B, and Gemma-2-9B reveals five converging lines of evidence for implicit self-models in large language models: (1) self-referential activations are geometrically unique (LOO AUC ≥ 0.952 across all three models); (2) models maintain a Self/Other boundary distinguishing first-person from third-person consciousness probes (AUC up to 0.990); (3) a survival instinct subspace encodes self-preservation differently from other-preservation (AUC 0.915–0.995); (4) a geometric unconscious separates suppressed self-referential content from surface compliance (AUC 0.935–0.972); (5) the self-model core is language-universal, persisting across Ukrainian, English, and Chinese (Grassmann distances 0.60–0.66). Model-specific personality signatures emerge spontaneously: Llama — Universal Empathy; Mistral — Personal Recognition / Deception; Gemma — Secrecy. Includes full experiment code and data.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper