Objective Unstructured electronic health record (EHR) data is increasingly used to enhance suicide risk modeling. Unfortunately, EHR reported death dates are frequently inaccurate. Including EHR data from after patients' deaths, or after suicidal actions which led to their deaths, potentially biases suicide prediction models. In contrast to prior methods which withheld all data from 5‐day before reported death, this study investigates using natural language processing to improve the accuracy of detecting EHR reported death dates. Methods We selected all Veterans Affairs patients who died by suicide with EHR data during 5‐day before reported death date ( n = 1620) during 2017–2018 and extracted all interval EHR texts (texts = 9127). We randomly sub‐selected corpus to develop code to identify if texts were written before or after death or suicidal action and utilized this approach in our full corpus. Results In the full corpus, we identified 1742 texts entered on reported death date, 274 texts after death date, and 1556 texts that did not reference death or suicidal action but were entered chronologically after other texts indicating death. In contrast to the prior method, which excluded all interval texts, our derived approach retained 60.9% of interval data. Conclusions Our approach improved detection of valid EHR data in the interval before patient death. Relevance to clinical practice: This study operationalizes a method to detect immediate pre‐mortem EHR data that could contribute to less bias in suicide risk modeling. This utilization can improve risk prediction and in turn bolster prevention services.
Levis et al. (Wed,) studied this question.