Objective Suicide and self-injury identification remain a medical and public health priority, particularly among underserved populations receiving care in safety-net psychiatric hospitals. However, clinical data are not readily available to support the rapid advancement of research leveraging artificial intelligence (AI) methods for insights generation. To further our understanding of suicidal events and related factors documented in clinical notes within the context of psychiatric safety-net hospitals, we aim to develop a gold-standard corpus of suicidality, perform a manual content analysis and deploy natural language processing algorithms for automating text classification. Methods and analysis A multidisciplinary panel developed an annotation guideline to capture four key suicide-related factors: suicidal ideation (SI), suicide attempt (SA), exposure to suicide and non-suicidal self-injury. We created an annotated corpus of 500 notes through a clinically validated annotation process and performed cohort analysis to characterise demographic and suicidal distributions. A pretrained language model was deployed for automatic classification. Results The annotated corpus was created with a Cohen’s kappa of 0.95 and further de-identified for data sharing. Most notes (79.4%) contained one (34.4%) or more (45%) suicide-related labels, with SI and SA co-occurrence as the most frequent combination (35.6%), which demonstrates significant overlap. The cohort was characterised by a mean age of 33.4, 51.7% male and 75.8% singles. Prevalent stressors included unemployment (24.2%), homelessness (12.0%), limited healthcare access (5.4%) and legal challenges (5.0%). We identified four key insights to improve documenting suicidality, including implicitness, confliction, ambiguity and definition coverage incompleteness. The baseline model achieved a micro-averaged F1 score of 0.70, demonstrating satisfying performance in multi-label classification. Conclusion The near-perfect inter-annotator agreement underscores the proposed annotation process and data quality. Cohort analysis highlights the distribution and documentation insights of suicidality. Data modelling demonstrates the potential of insight generation via AI-powered methods for mining large-scale clinical notes.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zehan Li
Northeastern University
Rafaela Miguel Vieira
Universidade Evangelica de Goiás
Sunyang Fu
University of Iowa
Yale University
Rice University
The University of Texas Health Science Center at Houston
Building similarity graph...
Analyzing shared references across papers
Loading...
Li et al. (Mon,) studied this question.
synapsesocial.com/papers/68c195649b7b07f3a0619543 — DOI: https://doi.org/10.1136/bmjdhai-2025-000019