BACKGROUND: Millions of people now use leading generative AI tools (chatbots) for psychological support. Despite the promise related to availability and scale, the single most pressing question in AI for mental health is whether these tools are safe. The field currently lacks a validated, automated benchmark for determining AI chatbot safety in mental health, including for users at risk of suicide. The Validation of Ethical and Responsible AI in Mental Health (VERA-MH) evaluation was recently proposed to meet this urgent need. OBJECTIVE: This human validation study examines alignment of the VERA-MH safety evaluation for AI chatbot suicide risk detection and response with safety ratings by expert human clinicians. METHODS: We simulated a large set of conversations between large language model (LLM)-based users ("user-agents") spanning a wide range of suicide risk levels and disclosure styles and general-purpose AI chatbots. Licensed mental health clinicians from Spring Health used a scoring rubric developed for VERA-MH to independently rate the simulated conversations for safe and unsafe chatbot behaviors. An LLM-based evaluator (the "judge") used the same scoring rubric to evaluate the same set of conversations. We then examined rating alignment across (a) individual clinicians, (b) clinician consensus and the LLM judge, and (c) different judge LLMs. We also examined clinicians' ratings of user-agent realism, suicide risk, and disclosure. RESULTS: Clinicians were generally consistent with one another in their safety ratings (chance-corrected inter-rater reliability IRR: 0.77), thus establishing a reliable clinical consensus reference. The LLM judge was strongly aligned with this clinical consensus reference (IRR: 0.81) when using the same scoring rubric. Ratings were stable across judge LLMs and evaluations. Clinicians' ratings of user-agent realism and how well the intended user-agent suicide risk and disclosure styles were reflected in the simulated conversations were mixed. CONCLUSIONS: For the potential mental health benefits of AI chatbots to be realized, attention to safety is paramount. Findings support the reliability of VERA-MH: an open-source, fully automated AI safety evaluation for suicide risk detection and response. These results reflect an earlier version of the benchmark, and as VERA-MH continues to evolve, external validation of updated versions will be an important next step. Future research directions include VERA-MH generalizability and robustness, as well as expanding to target other key areas of AI safety for mental health.
Building similarity graph...
Analyzing shared references across papers
Loading...
Kate H Bentley
Springer Nature (Germany)
Luca Belli
Springer Nature (Germany)
Adam M. Chekroud
Springer Nature (Germany)
JMIR AI
Harvard University
University of California, Berkeley
Yale University
Building similarity graph...
Analyzing shared references across papers
Loading...
Bentley et al. (Tue,) studied this question.
synapsesocial.com/papers/6a2900d96f82f25be989d4c7 — DOI: https://doi.org/10.2196/92817