Abstract Rationale Standard sleep staging depends on manual scoring of multi-channel polysomnography (PSG). Home sleep apnea tests (HSATs) and wearable devices have broadened access to sleep architecture assessment by leveraging artificial intelligence (AI) to analyze photoplethysmography (PPG) and actigraphy signals. However, their reliability remains limited due to the scarcity of annotated hypnograms for training deep learning models directly on PPG data. This study aimed to develop a deep neural network for sleep staging that was pretrained on a large PSG dataset and subsequently fine-tuned to PPG-based HSAT data through transfer learning. Methods A fully convolutional network (FCN) was first trained on the Multi-Ethnic Study of Atherosclerosis (MESA) dataset, which includes type II PSG recordings with electrocardiography-derived instantaneous heart rate (IHR), oxyhemoglobin saturation (SpO2), and concurrent wrist actigraphy. The model performed both 3-stage (Wake, NREM, REM) and 4-stage (Wake, Light N1+N2, Deep N3, REM) sleep classification on 30-second epochs. The pretrained weights were then fine-tuned on a proprietary HSAT dataset (Biologix Sleep Test®, Biologix Sistemas S.A.) containing PPG-derived IHR, high-resolution SpO2 (0.1%), and fingertip accelerometry. Performance was assessed by 10-fold cross-validation (CV) using epoch-level accuracy, Cohen’s κ, and class-specific sensitivity and specificity, as well as correlation and Bland-Altman analyses of exam-level sleep indices (total sleep time TST, efficiency, latency, and wake after sleep onset WASO). Results A total of 1,742 sleep studies, corresponding to 2,189,927 epochs, from the MESA dataset were used for model development. The FCN model achieved substantial agreement in 10-fold CV: 3-stage accuracy=84.8%, κ = 0.74; 4-stage accuracy=79.0%, κ = 0.67. After fine-tuning on HSAT data (300 patients; 279,269 epochs), 3-stage accuracy remained high (80.8%, κ = 0.64), and 4-stage accuracy reached 70.9% (κ = 0.57). In the 4-stage classification, REM sensitivity improved from 74.3% (MESA) to 80.5% (HSAT), with specificities of 96.3% and 93.8%, respectively; Deep-sleep sensitivity increased markedly from 32.9% (MESA) to 72.1% (HSAT), with corresponding specificities of 97.7% and 91.6%. Exam-level sleep indices showed moderate-to-strong correlation with PSG (MESA: TST r = 0.77, latency r = 0.66, efficiency r = 0.77, WASO r = 0.62; HSAT: TST r = 0.70, latency r = 0.59, efficiency r = 0.62, WASO r = 0.62) and small mean biases (MESA: TST +17.4 min, latency +1.9 min, efficiency +2.8%, WASO -14.1 min; HSAT: TST +7.4 min, latency -0.8 min, efficiency +1.5%, WASO -4.8 min). Conclusion Transfer learning enabled reliable 3-class and 4-class sleep staging from PPG-based HSAT signals. These findings demonstrate how leveraging large PSG datasets can enhance model generalizability and support scalable AI-based sleep monitoring in wearable and home-diagnostic applications. This abstract is funded by: None
Santos et al. (Fri,) studied this question.