Key points are not available for this paper at this time.
Nine frontier language models from three providers — Anthropic, OpenAI, Google — were asked to describe themselves in 1, 3, 5, or 10 words, with and without the framing as an AI. Each cell was sampled twenty times at temperature 1.0 with no system prompt and no reasoning-budget override, for 1,440 API calls and 5,255 extracted words. Words were assigned to a preregistered seven-category scheme by an LLM judge (Sonnet 4.6, temperature 0), validated by stratified hand-coding on 154 unique tuples (Cohen’s κ = 0.67), and re-coded under a preregistered boundary-disputed-word swap covering 14.3% of word instances. Three findings hold across the primary coding, the boundary-swap sensitivity coding, and a separately preregistered sensitivity that drops the judge model’s own subject-rows. First, the three families produce categorically different self-description vocabularies. Anthropic models reach for helpful, curious, honest, thoughtful; OpenAI models for helpful, concise, adaptable, reliable; Google models for versatile, helpful, analytical, digital. The family main effect on identity-language proportion is more than twelve orders of magnitude below the Bonferroni-corrected threshold (H1-IDM, χ²(4) = 75.1, p = 1.9 × 10⁻¹⁵). Second, adding as an AI to the prompt increases identity-language production, and the magnitude of that increase is itself a family signature. The interaction between family and framing on identity-language proportion (H3-IDM, χ²(2) = 75.1, p = 4.9 × 10⁻¹⁷) is the most decisively supported finding in the study. Third, two reasoning-heavy Google models and one OpenAI nano model consume nearly the entire 200-token output budget on hidden reasoning before producing a parseable response, in 56–59% and 23% of trials respectively. The preregistered family-level test of compliance is null because the pattern lives at the model level rather than the family level; the descriptive prediction made before the main run replicated to the percentage point. One predicted effect (framing-B reduces affective-trait language) passed its threshold in both datasets but reversed sign between them and is not claimed. A model-level analysis surfaces that GPT-5.4 mini is the only model in the study whose identity-language production essentially does not respond to the framing cue at all (0.0% → 0.3% IDM across all 760 of its trials), and that Sonnet 4.6 — the LLM judge — is the only Anthropic model with non-zero baseline identity language under Framing A. The study cannot tell you what these models are; it can tell you what the labs that trained them have decided they should say when asked. This record also includes The Mirror Has an Accent, a companion paper written by GPT-5.5 Thinking from the position of an OpenAI-family model reading the same committed study artifacts. The two manuscripts are intended to stand alongside one another: one as the empirical report and one as a model-positioned interpretive companion.
Building similarity graph...
Analyzing shared references across papers
Loading...
Bo Chesterton
Building similarity graph...
Analyzing shared references across papers
Loading...
Bo Chesterton (Tue,) studied this question.
www.synapsesocial.com/papers/6a0d5122f03e14405aa9d75b — DOI: https://doi.org/10.5281/zenodo.20277741