What question did this study set out to answer?

The study aims to assess the accuracy of sleep staging using a deep learning model on wearable-derived signals from patients with obstructive sleep apnea.

May 20, 2026

A30-03 Deep Learning Optimization of Sleep Stage Classification Using Wearable-Derived SpO2, Pulse Rate, and Movement Features

Key Points

The study aims to assess the accuracy of sleep staging using a deep learning model on wearable-derived signals from patients with obstructive sleep apnea.
Data were collected from 90 in-lab polysomnography sessions paired with recordings from 25 participants with OSA.
A deep learning model was trained on synchronized expert-scored hypnograms and wearable signals, utilizing a 4-class staging neural network.
Model performance was assessed using Cohen’s Kappa, sensitivity, specificity, and AUC metrics.
Kappa values for sleep classification were 0.38 ± 0.12 (four-class), 0.46 ± 0.12 (three-class), and 0.60 ± 0.11 (two-class), indicating fair to moderate agreement with PSG.
Light and NREM sleep sensitivities were high (0.71 ± 0.14 and 0.88 ± 0.09, respectively), while REM sensitivity was low at 0.25 ± 0.24.
Sleep/Wake detection showed strong performance with a sensitivity of 0.94 ± 0.05 and AUC of 0.90.

Abstract

Abstract Rationale Sleep staging has classically relied on expert-scored polysomnography (PSG), which requires comprehensive electrophysiological recordings. Despite advances in automated PSG-based staging, its cost and complexity limit long-term or home use. Minimally obtrusive wearables allow assessment of sleep architecture in natural environments over time, offering clinical value for obstructive sleep apnea (OSA) patients needing ongoing treatment monitoring. Commercial consumer wearables typically rely on high-rate photoplethysmography (PPG) or multiple sensors, mainly for healthy users. We propose a proximal finger-mounted pulse-oximeter (wearable) using transmittance-based PPG to derive SpO2, pulse rate (PR), and motion at 1 Hz. This approach provides a practical tool for continuous monitoring of disease severity or therapeutic response. Our study evaluates whether accurate sleep staging can be achieved using a deep learning framework on these signals. Methods Data were collected from 90 in-lab PSG sessions paired with wearable recordings from 25 participants with mild-to-severe OSA, each undergoing up to four PSGs under baseline, placebo, and active treatment conditions (NCT05793684). Expert-scored hypnograms were synchronized with wearable signals and split by participant into training, validation, and test sets. Wearable data were preprocessed via physiologic-range validation, interpolation, and per-participant centering. A deep learning model was trained using 390-second input windows centered on each 30-second epoch. The 4-class staging neural network included 1D convolutional layers, recurrent layers (Bi-LSTM), and a fully connected layer with an extra feature, PR × SpO2. Model training used participant-wise stratified data splits, early stopping, and threshold-aware evaluation. Performance was assessed using Cohen’s Kappa, sensitivity, specificity, and area under the Receiving-Operating Characteristic curve (AUC). Results In the test set, Kappa values for four-, three-, and two-class classification were 0.38 ± 0.12, 0.46 ± 0.12, and 0.60 ± 0.11, respectively, indicating fair to moderate agreement with PSG-based staging. The Wake sensitivity was 0.65 ± 0.15. Among the four- and three-class models, high sensitivity was observed for Light (0.71 ± 0.14) and NREM (0.88 ± 0.09) sleep. REM sensitivity was the lowest, at 0.25 ± 0.24. Sleep/Wake detection showed strong performance (sensitivity 0.94 ± 0.05, AUC: 0.90). Conclusions Sleep staging using a single wearable showed fair-to-moderate agreement with PSG, even under varied treatment conditions. These findings highlight the potential for developing robust, generalizable classifiers using low-bandwidth wearable data. Despite the limited sample size, the results show promising accuracy with a minimally obtrusive and comfortable wearable, and performance is expected to improve with larger datasets. This abstract is funded by: Apnimed, Inc.

Mark Helpful

Bookmark

Relay