Large language models (LLMs) now pass the Turing Test routinely, yet what this achievementreveals about machine reasoning remains unclear. This paper introduces the Franny Test — athree-step adversarial dialogue protocol that probes a specific capacity no existing benchmarkaddresses: the ability to handle strategic manipulation of the reasoning frame itself (reframing). Theprotocol presents a proposition containing a deliberately undefined variable, allows the model tocommit to a position, and then retroactively defines the variable in a way that forces frame-levelrecalculation. Operationalizing the theoretical framework of Sophia (2025a), we test models acrossthe major commercial families (GPT-series, Claude, Gemini, Grok, search-optimized, anddistillation-derived models) and identify eleven distinct response patterns, extending the threetypologies of the prior work. We demonstrate three additional findings: (1) response patternsfunction as behavioral fingerprints that distinguish model families and reveal distillation lineage —illustrated by the Namazu (Sakana AI) case, where a DeepSeek-derived model exhibits a GPT-seriesbehavioral profile despite Japanese-language fine-tuning; (2) structural response patterns areinvariant across the full compute spectrum, from reduced-resource inference to approximately 60minutes of extended thinking, establishing the metacognitive limitation as architectural rather thancomputational; and (3) the findings converge with independent evidence from CHI 2025 (Shin et al.,2025), where LLMs were found to provide no benefit for problem reframing from a tool-useperspective. We derive implications for AI safety, including a design recommendation to separateframe-level detection from action, and position the Franny Test as an early warning system: the day amodel handles the retroactive definition without structural breakdown is the day the metacognitivebarrier has fallen.
Building similarity graph...
Analyzing shared references across papers
Loading...
Franny Philos Sophia
Building similarity graph...
Analyzing shared references across papers
Loading...
Franny Philos Sophia (Thu,) studied this question.
www.synapsesocial.com/papers/69c771988bbfbc51511e192a — DOI: https://doi.org/10.5281/zenodo.19235698