Abstract Introduction Obstructive sleep apnea (OSA) and insomnia remain widely underdiagnosed. Contactless smartphone applications now provide longitudinal, ecologically valid sleep data at scale. We tested whether machine learning (ML) models trained on these data, with systematic hyperparameter and threshold optimization, can effectively screen for OSA and insomnia relative to validated questionnaire-based tools. Methods Observational ML modeling used sleep-stage estimates from a PSG-validated smartphone sonar application (SleepScore). Two independent cohorts were analyzed. The survey-threshold (ST) set included 2,813 users (423,085 nights) labeled by Athens Insomnia Scale ≥6 and NoSAS ≥8. The self-report (SR) set included 68,230 users (2.27 million nights) who reported prior insomnia (6%) or OSA (9%) diagnoses. We excluded users undergoing treatment or with comorbid sleep disorders. To improve label accuracy for the ST cohort, inclusion required disorder-consistent objective sleep patterns: for insomnia, sleep-onset latency 30 min, wake after sleep onset 30 min, and sleep duration 6.5 h with sleep efficiency 80% or sleep-maintenance index 85%; for OSA, ≥12 awakenings on ≥3 consecutive nights or ≥7 on ≥3 nights/week for ≥4 of 8 weeks. For each user, 262 features summarized nightly time series (distributional metrics, weekday–weekend contrasts, temporal and frequency features, age, and gender). An automated ML framework (PyCaret) compared classifiers using 10-fold cross-validation with 20% holdout, optimizing Cohen’s κ. Top models underwent Bayesian hyperparameter tuning (Optuna) with cross-validation and threshold calibration. All splits were user level (80/20) to prevent target leakage. Results After tuning, best-performing models achieved AUCs of 0.80 (OSA-SR, linear-discriminant-analysis), 0.74 (OSA-ST, Random Forest), 0.73 (insomnia-SR, Random Forest), and 0.62 (insomnia-ST, ensembled Random Forest). OSA-ST showed balanced precision (0.73) and recall (0.70); OSA-SR favored recall (0.76) over precision (0.29), consistent with class imbalance. Insomnia-ST achieved high recall (0.87) with strong precision (0.78). Feature importance revealed insomnia predictions were driven by sleep-onset latency and timing metrics, while OSA predictions emphasized sleep architecture and fragmentation indices. Conclusion Tuned ML models using objective sleep data from a smartphone app identified OSA and insomnia risk with promising discriminative performance and high recall. This scalable approach could help address chronic underdiagnosis by flagging at-risk users for confirmatory testing and earlier intervention. Support (if any) Sleep.ai
Gottlieb et al. (Fri,) studied this question.