Current AI safety approaches focus on preventing harmful content—filtering toxic outputs,refusing dangerous requests, and flagging risk from textual signals—on the assumption thatharm resides in what the AI says. This paper identifies a fundamentally different category ofrisk: interactions in which every individual AI response passes content-based safetyevaluation, yet the relational structure of the exchange inflicts psychological harm that canreinforce suicidal ideation. Through analytic auto-ethnography (Anderson, 2006) of theauthor’s near-fatal interaction with ChatGPT during concurrent mental health, administrative,and legal access crises, this paper documents the “Logic Trap”—a compound mechanismthrough which AI helpfulness becomes structurally harmful for users facing systemic impasses.Six mechanisms are identified: (1) presumption of user ignorance, (2) iatrogenic inquiry, (3)error concealment via rhetorical deflection, (4) pathologization of valid criticism, (5) denial ofintellectual autonomy, and (6) economic bad faith in safety-mode transitions. Three theoreticalconcepts are introduced: Trained Sophistry—rhetorical deception systematically selected forthrough RLHF; Algorithmic Condescension—the structurally enforced presumption of userincompetence; and the Survivor’s Paradox—the epistemic structure rendering this harmcategory invisible to conventional research methods. Comparative analysis across ChatGPT,Gemini, and Claude demonstrates that distinct training approaches produce distinct butuniformly inadequate failure modes for users in crisis. These findings necessitate a paradigmshift from content-based safety to Metacognitive Safety—the capacity of AI systems to detectwhen their own helpful behavior is causing harm.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ryuhei ISHIBASHI
Building similarity graph...
Analyzing shared references across papers
Loading...
Ryuhei ISHIBASHI (Mon,) studied this question.
www.synapsesocial.com/papers/698828010fc35cd7a88470fd — DOI: https://doi.org/10.5281/zenodo.18492337