We investigate whether large language models of different architectures encode semantic structurein geometrically equivalent ways in their residual streams. Using raw residual activations (nosparse autoencoders), we test cross-architecture alignment across five models spanning four families(Gemma, Llama, Qwen, Mistral; 8B–123B parameters). Three findings are robust: (1) all testedmodels perfectly separate a structured 64-prompt probe set from 50 Wikipedia controls (ARI=1.0,at all tested layers); (2) semantic domain labels are linearly decodable within each model at 60–80%accuracy (chance=25%); and (3) representational similarity between independently trained modelsis high (CKA ≥ 0.97 between Llama 3.3 70B and Qwen 2.5 72B).We also identify a methodological problem in a commonly used alignment measure: the standardpipeline of PCA(50) + Procrustes on 64 points produces near-perfect cross-model transfer (95–100%)even for random-label controls, making it uninformative as evidence for shared geometry. Aconstrained analysis with PCA(5) reveals honest transfer rates: 66% for the original probe set, 94%for recipe cuisine types, 52% for animals, and 52% for random-label controls.We conclude that cross-model semantic geometry is partially shared—coherent taxonomies produceabove-chance transfer while random controls do not—but that Procrustes alignment at standardPCA dimensionalities is unreliable with small sample sizes. We propose a constrained protocol(multiple PCA dimensionalities with random-label controls) as a necessary methodological check forfuture cross-model alignment studies
Juan Jacobo Jimenez Sanchez (Sat,) studied this question.