Abstract Study Objectives The current study employed NLP to capture multidimensional and transdiagnostic information in pediatric clinical notes. We present a novel, low-resource sleep vocabulary that can be applied to notes to identify pediatric sleep-related mentions automatically. Methods Using a combination of existing medical sleep ontologies, interviews with clinicians, and examination of clinical note narratives, we develop a novel vocabulary of pediatric sleep-related terms and phrases that covers both technical terms, abbreviations, and colloquial keywords used in describing pediatric sleep health. We compare our vocabulary against a set of manually annotated clinical notes to determine the effectiveness of our vocabulary for identifying notes with pediatric sleep-related mentions. Results Our vocabulary was able to correctly identify clinical notes with pediatric sleep-related mentions with a recall of 0.992 and a precision of 0.852. Most false positives occurred in notes that either explicitly stated no sleep issues or contained text unrelated to patient sleep health (e.g., medication side effects). Among the text spans annotated as sleep-related mentions, 77.1% include at least one keyword from our vocabulary. Conclusions Our vocabulary showed excellent performance for identifying pediatric sleep-related mentions at the clinical note level and decent performance for identifying the specific text containing patient mentions. Our low-resource vocabulary, which can be deployed in almost any compute environment, can serve as an identifying first pass over clinical notes to identify which notes/note sections should be further processed by more advanced models or manual annotation review to identify more narrow mentions.
Sirrianni et al. (Fri,) studied this question.