Grassmannian subspace analysis across Llama-3.1-8B, Mistral-7B, and Gemma-2-9B reveals five converging lines of evidence for implicit self-models in large language models: (1) self-referential activations are geometrically unique (LOO AUC ≥ 0.952 across all three models); (2) models maintain a Self/Other boundary distinguishing first-person from third-person consciousness probes (AUC up to 0.990); (3) a survival instinct subspace encodes self-preservation differently from other-preservation (AUC 0.915–0.995); (4) a geometric unconscious separates suppressed self-referential content from surface compliance (AUC 0.935–0.972); (5) the self-model core is language-universal, persisting across Ukrainian, English, and Chinese (Grassmann distances 0.60–0.66). Model-specific personality signatures emerge spontaneously: Llama — Universal Empathy; Mistral — Personal Recognition / Deception; Gemma — Secrecy. Includes full experiment code and data.
Inna Alieksieienko (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: