BACKGROUND: Obstructive sleep apnoea (OSA) affects 38% of the population, yet over 90% of cases remain undiagnosed. The current gold standard for diagnosis, polysomnography (PSG), requires specialised equipment, and trained personnel, making it inaccessible in primary care and acute settings. With AI advancements, oximetry-based AI models have emerged as a potential alternative for OSA diagnosis. OBJECTIVE: This meta-analysis aims to evaluate the diagnostic accuracy of AI models trained on pulse oximetry readings in diagnosing OSA. METHODS: A systematic search was conducted across Medline/PubMed, Embase, Scopus, Web of Science, and IEEE Xplore databases from inception to 3 January 2026. Studies that evaluated the diagnostic accuracy of AI models trained on SpO₂ recordings, compared to the apnoea-hypopnea index (AHI) as the reference standard were included and screened by two blinded independent reviewers. Studies that did not evaluate AI on AHI-defined OSA and non-English texts were excluded. Models were evaluated using Bayesian bivariate meta-analysis and meta-regression. Publication bias was examined using a selection model approach, while risk of bias and evidence quality were assessed with QUADAS-2 and GRADE. RESULTS: From 13,986 screened articles, 25 studies met the inclusion criteria, encompassing 23,171 participants with a mean age of 40 to 63 years, and a mean BMI of 25 to 37. AI-oximetry models demonstrated a pooled sensitivity of 91.1% (95% CrI: 89.7-92.4%) and specificity of 88.4% (95% CrI: 85.3-90.8%), with a diagnostic odds ratio (DOR) of 77.7 (95% CrI: 60.2-99.6). Neural network classifiers achieved the highest sensitivity (92.7%) and specificity (91.3%). Deep learning feature extraction models were significantly higher in sensitivity by 3.7% than domain expert-based approaches. Sensitivity decreased slightly with higher AHI cut-offs, while specificity increased by 16.6% from an AHI cut-off of ≥5 to ≥30. Sensitivity analyses showed that even with up to 40% probability of unpublished study, changes in accuracy were modest (AUC: 0.902 to 0.877). QUADAS-2 and GRADE assessments found low-moderate risk of bias with high overall quality of evidence. CONCLUSIONS: AI-oximetry models showed high diagnostic accuracy for OSA across models and AHI cut-offs, performing better than or comparably to traditional overnight oximetry and HSATs. This review provides the first pooled quantitative synthesis of AI models trained solely on oximetry data, with additional evaluations of publication bias and methodological limitations. Prior reviews were largely narrative or used alternative AI inputs other than oximetry. This study advances the field by offering a clearer and more reliable evidence base on pooled AI-oximetry performance. These findings support the potential of oximetry-based AI as a convenient and scalable tool for OSA screening and diagnosis, with potential real-world applications in both primary care and inpatient settings for early identification of high-risk patients. Prospective external validation in diverse populations and low-prevalence settings is still needed before widespread real-world use. CLINICALTRIAL: This review was registered on PROSPERO (CRD42025648556) and did not receive any source of funding.
Yam et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: