Accurate identification of acute and subacute subdural hematoma (acute/subacute SDH) is critical for improved patient outcomes. However, large-scale research is hindered by unreliable identification methods in electronic health records (EHRs). Current approaches relying on International Classification of Diseases (ICD) codes lack specificity and cannot distinguish acute, subacute, and chronic cases; manual chart review is too labor-intensive to scale. We developed an automated phenotyping algorithm using structured data and unstructured clinical notes for high-accuracy retrospective identification of acute/subacute SDH. We analyzed 2999 records from two hospitals, including ICD-positive and ICD-negative acute/subacute SDH cases verified by manual chart review. Features for model training included ICD codes, Current Procedural Terminology (CPT) codes, and clinical note keywords. Logistic regression and random forest models were trained using cross-validation and evaluated using AUROC and AUPRC. External validation involved training on one hospital and testing on the other. The random forest keywords-only model performed best, achieving an AUROC of 0.985 (95% CI: 0.980–0.990) and AUPRC of 0.944 (95% CI: 0.923–0.962) on the test set. External validation demonstrated strong AUROCs of 0.965 and 0.971 and AUPRCs of 0.831 and 0.840. The overall error rate was <1%. This model provides a scalable, highly accurate approach to acute/subacute SDH detection in EHR research.
Hooke et al. (Mon,) studied this question.