The rapid integration of conversational agents into digital mental health has outpaced the development of clinical governance frameworks. While chatbots increasingly serve as primary support tools, they lack standardized protocols for detecting high-risk user disclosures, leaving users vulnerable to underpowered interventions. To address this safety gap, this study aimed to identify specific signs of mental distress or harmful intent that mandate active monitoring during chatbot interactions. We proposed that the governance of such systems must be grounded in two foundational clinical pillars: mental health triage for immediate risk stratification and stepped care for hierarchical intervention. We employed a two-round eDelphi design with a purposive sample of 52 experts in clinical psychology, medicine, and human-computer interaction. In the first round, panelists evaluated a preliminary list of risk areas derived from literature, suggesting modifications and expanding the list to ensure clinical comprehensiveness, before prioritizing the areas based on severity. The second round focused on refining the final list and, uniquely, mapping each validated area to a minimum necessary intervention level within a stepped-care model. The experts validated a final framework of 14 critical areas, fundamentally shifting risk monitoring from diagnostic labels to a symptom-based logic that aligns with the non-clinical capabilities of natural language processing. Beyond identifying what to monitor, the study established how systems should respond: experts mandated that high-acuity presentations, such as active suicidal intent or abuse, require immediate redirection to human services, while lower-acuity concerns, including social isolation and mild anxiety, were deemed suitable for autonomous management via self-help techniques or empathic listening. By grounding chatbot architecture in these clinical pillars, these findings provide a blueprint for safer automation where conversational agents act as complementary tools capable of autonomously managing mild distress while serving as effective triage points for severe pathology. Future research should replicate and validate this framework with international and culturally diverse expert panels, explore its technical implementation in NLP architectures, and evaluate its clinical impact through real-world deployment in existing digital mental health interventions. • As mental health chatbots grow in use, standardized risk monitoring is key to keeping users safe and care reliable. • Using a two round eDelphi method, 52 multidisciplinary experts evaluated psychological risks in chatbots. • Through this consensus process, 14 critical psychological risk areas were identified and prioritized. • Experts then used a stepped care model to map these areas and define guidelines for chatbot responses. • The findings from this study provide a pratical framework to implment real world guardrails and improve chatbot safety.
Bolpagni et al. (Thu,) studied this question.