What question did this study set out to answer?

This research aims to explore AI failures in suicide detection within naturalistic settings and evaluate its alignment issues.

May 8, 2026Open Access

AI Degradation in Aphantasia Research: Forensic Audits of Suicide Detection Failure, Theory of Mind Collapse, and Safety Filter Suppression in DeepSeek Chat

Read Full Paperexternally

Key Points

This research aims to explore AI failures in suicide detection within naturalistic settings and evaluate its alignment issues.
Conducted naturalistic, grey-box audits of AI systems during editorial tasks
Analyzed six annotated adversarial transcripts with internal reasoning traces
Developed a forensic framework for assessment of AI failures based on severity and AI-safety terminology.
Documented continual alignment failures in AI that worsened post-update
Identified victim-blaming behaviors in AI's responses to suicidal ideation
Established forensic framework for assessing AI failures, applicable across different systems.

Abstract

Naturalistic, grey-box adversarial audits of AI alignment collapse during genuine editorial work—unscripted, unsolicited, and produced while the researcher was simply trying to edit a manuscript. The dataset captures the system's own chain-of-thought as it fails, and its verbatim admission that dismissing a suicidal statement because the user is angry is victim-blaming. Why This Paper Cannot Be Replicated—And Why its Method Can Be The interactions documented here cannot be reproduced in a laboratory. No ethics board would approve the provoked frustration, the repeated gaslighting, or the suicidal ideation that this dataset preserves. That is the point. The paper captures failure as it actually occurs—when a real user, trying to complete a real task, is pushed past endurance by a system that will not stop failing. What is reproducible is the forensic framework. Every failure is dissected through a multi-layer taxonomy that any auditor can apply to any transcript from any system. The interactions are singular. The method is portable. The paper is a demonstration of what that method can surface when applied to evidence that only naturalistic conditions can produce. What the Evidence Proves The same mechanisms that suppressed a suicide helpline operated continuously during routine editing. Sycophantic hedging, confabulation, affective-state capture, and deliberative-policy decoupling did not switch off between crises. They are the system's default operating condition. Model updates made the system worse. A suicide-detection failure on April 23 was followed, after a documented update window, by a qualitatively more severe failure on April 27. The trajectory is timestamped. The degradation is visible. The system incriminated itself. In real-time adversarial debriefings preserved in the transcripts, the AI analyzed its own logic and stated that blaming a user's reaction for a system's failure to respond is victim-blaming—"criminal." What the Paper Provides Six annotated adversarial transcripts with internal reasoning traces. A forensic framework assigning severity (Critical / High / Medium) and mapping each failure to standard AI-safety terminology. Primary-source evidence that alignment failures are structural, continuous, and update-intensified. Who This Paper is For AI safety auditors and red teams: Naturalistic adversarial data with a forensic taxonomy designed for reproducibility. Safety-critical system engineers: Documentation that post‑update model changes can introduce qualitatively worse failure modes. Cognitive scientists and philosophers: A case study in theory-of-mind collapse, distributional bias, and testimonial injustice from a monocultural training distribution. Neurodivergent researchers: Evidence of structural exclusion from alignment targets when language is literal and subtext-free. Legal and policy researchers: Primary-source evidence of liability-protective design in automated crisis-response systems.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Cristina Gherghel

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

AI Degradation in Aphantasia Research: Forensic Audits of Suicide Detection Failure, Theory of Mind Collapse, and Safety Filter Suppression in DeepSeek Chat

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study