Abstract Introduction Differentiation between narcolepsy type 2 (NT2) and idiopathic hypersomnia (IH) is often challenging due to overlapping features and limitations of diagnostic testing. We used machine learning on routine PSG features to directly predict expert clinical diagnosis of IH versus NT2. Methods We conducted a manual chart review of patients undergoing MSLT for suspected central disorders of hypersomnolence (CDH). Only individuals classified clinically as IH or NT2 were included. Forty-five PSG metrics from final reports, including demographics, sleep architecture, and respiratory indices, were extracted. Missing values were imputed using the median (for numeric features) or the mode (for categorical features), and features were standardized. Six machine-learning classifier techniques were evaluated using nested cross-validation (5-fold outer, 5-fold inner) with Optuna hyperparameter optimization (1,000 trials per inner fold). We used ANOVA with p-values 0.05 within each fold for feature selection. SHAP values were used to quantify feature importance. T-test and chi-squared test were used to assess statistical significance between IH and NT2. Metrics are reported as mean (standard deviation). Results The cohort included 454 patients, N=147(32%) with a clinical diagnosis of IH and N=307 (68%) with NT2. Overall age was 34.0(13.1) years, BMI was 27.6(6.6) kg/m², 351(77.3%) female, and 327(72.0%) Caucasian. Sex distribution differed between IH and NT2 (female 81.4% vs 68.7%). Within the clinically defined IH and NT2 cohort, agreement with ICSD-3 diagnoses was 58% for IH, 67% for NT2, and 61% overall. The logistic regression classifier achieved the best performance, with an AUC-ROC of 66% (5%) and a balanced accuracy of 63% (4%). Using a 60% probability threshold for NT2, precision was 49% (8%), sensitivity was 36% (6%), and specificity was 82% (4%). The SHAP analysis indicated that the features most strongly associated with NT2 were shorter REM latency (p 0.001), lower non-REM sleep time (p=0.003), higher sleep efficiency in the supine position (p=0.023), and male sex (p=0.014). Conclusion ICSD-3 (MSLT-based) diagnoses show poor agreement with expert clinical diagnoses of NT2/IH, highlighting the limitations of current diagnostic criteria and the need for alternative diagnostic modalities. Machine learning models applied to routine PSG features provide only moderate differentiation between NT2 and IH. Support (if any)
Araujo et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: