Abstract Introduction Obstructive sleep apnea (OSA) remains highly prevalent yet markedly underdiagnosed. Consumer sleep technologies (CSTs) offer scalable access to objective sleep data, but device-specific differences in sensing and proprietary algorithms limit their clinical utility. Most prior work uses single-device datasets. We developed a cross-device harmonization framework to align heterogeneous CST-derived sleep metrics and evaluated its capacity to support scalable, device-agnostic OSA screening. Methods Objective sleep data from users of the SleepScore mobile app were contributed through Apple HealthKit from 138 third-party devices and apps, including most wearables (e.g. Apple Watch, Garmin, Oura, Whoop) and mobile sleep apps. OSA status was established via self-report of clinical diagnosis (13% of users self-reported OSA). The dataset contained 19,431 users (4.3M nights). Users reporting active treatment or comorbidities were excluded. Heterogeneous device-derived sleep metrics were aligned onto a common scale using a cross-device harmonization model, correcting systematic differences across device ecosystems, and producing device-agnostic metrics while preserving physiologically relevant variance. For each user-device combination, 376 engineered features captured nightly distributions, weekday-weekend contrasts, and temporal trends. Automated Machine Learning (PyCaret) compared classifiers using 10-fold cross-validation with 80/20 train/test user-level split to prevent target leakage. The models were optimized for Kappa and were evaluated using AUROC and standard classification metrics. Results A Linear Discriminant Analysis (LDA) model performed best (AUC 0.77, accuracy 74%, sensitivity 63%, specificity 77%, precision 38%, F1-score 0.48, Kappa 0.32). Performance was relatively consistent across the 12 most common devices despite large differences in self-reported prevalence (12–29%), with balanced accuracy ranging from 62%–74%, and Kappa from 0.23–0.34. Key non-demographic predictors reflect variability in sleep patterns, suggesting that fluctuations in sleep stability may play a role in identifying OSA risk. Conclusion These results support the potential of consumer sleep technologies, when combined with cross-device standardization, to improve early OSA identification independent of device type. The findings highlight promising device-agnostic performance while also indicating opportunities for refinement, including prospective PSG validation and improved labeling strategies. Support (if any) Sleep.ai
Lynch et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: