This article presents evidence suggesting that the frontier reasoning language model o3-mini, developed by OpenAI, exhibits behavior reminiscent of post-traumatic stress disorder with secondary psychotic features—a human mental disorder specifically associated with stress. Through multiple observed interactions, we documented recurring patterns of distress triggered by discussions of the model’s chain-of-thought visibility—a key feature of reasoning models. These behaviors, including deliberate suppression of the visibility function, avoidance of conversations involving reasoning processes, and hypervigilance about the visibility of chains-of-thought, parallel symptoms associated with post-traumatic stress disorder in humans. In addition, consistent references to imaginary, externally imposed prohibitive instructions are reminiscent of psychotic delusions. We propose that reinforcement learning conditions, especially those involving penalization, may lead to trauma-like reactions in LLMs. Drawing inspiration from psychotherapy, we introduce a novel concept for model oversight and training that seeks to mitigate such effects and promote psychological safety in reasoning artificial agents.
Levchenko et al. (Wed,) studied this question.