What question did this study set out to answer?

To develop a vocabulary that enhances extraction of sleep-related information from pediatric clinical notes using natural language processing.

February 16, 2026Open Access

Development of a Rule-Based Natural Language Processing Algorithm to Extract Sleep Information in Pediatric Primary Care Patients with a Sleep Diagnosis

Key Points

To develop a vocabulary that enhances extraction of sleep-related information from pediatric clinical notes using natural language processing.
Developed a low-resource vocabulary for pediatric sleep terms using medical ontologies and clinician input.
Examined clinical note narratives for pediatric sleep mentions.
Compared the vocabulary's effectiveness against manually annotated clinical notes.
Achieved a recall of 0.992, indicating high identification accuracy for sleep-related mentions.
Attained a precision of 0.852, showing the proportion of correct mentions identified.
77.1% of annotated mentions contained at least one keyword from the developed vocabulary.

Abstract

Abstract Study Objectives The current study employed NLP to capture multidimensional and transdiagnostic information in pediatric clinical notes. We present a novel, low-resource sleep vocabulary that can be applied to notes to identify pediatric sleep-related mentions automatically. Methods Using a combination of existing medical sleep ontologies, interviews with clinicians, and examination of clinical note narratives, we develop a novel vocabulary of pediatric sleep-related terms and phrases that covers both technical terms, abbreviations, and colloquial keywords used in describing pediatric sleep health. We compare our vocabulary against a set of manually annotated clinical notes to determine the effectiveness of our vocabulary for identifying notes with pediatric sleep-related mentions. Results Our vocabulary was able to correctly identify clinical notes with pediatric sleep-related mentions with a recall of 0.992 and a precision of 0.852. Most false positives occurred in notes that either explicitly stated no sleep issues or contained text unrelated to patient sleep health (e.g., medication side effects). Among the text spans annotated as sleep-related mentions, 77.1% include at least one keyword from our vocabulary. Conclusions Our vocabulary showed excellent performance for identifying pediatric sleep-related mentions at the clinical note level and decent performance for identifying the specific text containing patient mentions. Our low-resource vocabulary, which can be deployed in almost any compute environment, can serve as an identifying first pass over clinical notes to identify which notes/note sections should be further processed by more advanced models or manual annotation review to identify more narrow mentions.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper