What question did this study set out to answer?

The aim is to predict clinical diagnoses of idiopathic hypersomnia and narcolepsy type 2 using machine learning on polysomnography features.

May 10, 2026

0692 Using Machine Learning to Predict Clinical Diagnosis of Idiopathic Hypersomnia vs Narcolepsy Type 2 from Polysomnography

Key Points

The aim is to predict clinical diagnoses of idiopathic hypersomnia and narcolepsy type 2 using machine learning on polysomnography features.
Conducted manual chart review of 454 patients clinically diagnosed with idiopathic hypersomnia or narcolepsy type 2.
Extracted 45 polysomnography metrics for analysis, employing six machine learning classifiers with hyperparameter optimization.
Used statistical tests to assess significance and feature importance.
Logistic regression classifier achieved an AUC-ROC of 66% and balanced accuracy of 63%.
Using a 60% probability threshold, precision was 49%, sensitivity was 36%, and specificity was 82%.
Poor agreement was observed between machine learning predictions and ICSD-3 diagnoses.

Abstract

Abstract Introduction Differentiation between narcolepsy type 2 (NT2) and idiopathic hypersomnia (IH) is often challenging due to overlapping features and limitations of diagnostic testing. We used machine learning on routine PSG features to directly predict expert clinical diagnosis of IH versus NT2. Methods We conducted a manual chart review of patients undergoing MSLT for suspected central disorders of hypersomnolence (CDH). Only individuals classified clinically as IH or NT2 were included. Forty-five PSG metrics from final reports, including demographics, sleep architecture, and respiratory indices, were extracted. Missing values were imputed using the median (for numeric features) or the mode (for categorical features), and features were standardized. Six machine-learning classifier techniques were evaluated using nested cross-validation (5-fold outer, 5-fold inner) with Optuna hyperparameter optimization (1,000 trials per inner fold). We used ANOVA with p-values 0.05 within each fold for feature selection. SHAP values were used to quantify feature importance. T-test and chi-squared test were used to assess statistical significance between IH and NT2. Metrics are reported as mean (standard deviation). Results The cohort included 454 patients, N=147(32%) with a clinical diagnosis of IH and N=307 (68%) with NT2. Overall age was 34.0(13.1) years, BMI was 27.6(6.6) kg/m², 351(77.3%) female, and 327(72.0%) Caucasian. Sex distribution differed between IH and NT2 (female 81.4% vs 68.7%). Within the clinically defined IH and NT2 cohort, agreement with ICSD-3 diagnoses was 58% for IH, 67% for NT2, and 61% overall. The logistic regression classifier achieved the best performance, with an AUC-ROC of 66% (5%) and a balanced accuracy of 63% (4%). Using a 60% probability threshold for NT2, precision was 49% (8%), sensitivity was 36% (6%), and specificity was 82% (4%). The SHAP analysis indicated that the features most strongly associated with NT2 were shorter REM latency (p 0.001), lower non-REM sleep time (p=0.003), higher sleep efficiency in the supine position (p=0.023), and male sex (p=0.014). Conclusion ICSD-3 (MSLT-based) diagnoses show poor agreement with expert clinical diagnoses of NT2/IH, highlighting the limitations of current diagnostic criteria and the need for alternative diagnostic modalities. Machine learning models applied to routine PSG features provide only moderate differentiation between NT2 and IH. Support (if any)

Mark Helpful

Bookmark

Relay