Polysomnography (PSG) is the gold standard for diagnosing obstructive sleep apnea (OSA), but its cost and complexity limit widespread screening. EEG provides rich temporal–spectral data that deep learning can leverage for automated OSA classification. This study compares three deep learning approaches for four-class OSA severity classification (healthy, mild, moderate, severe) using single-channel EEG (C3-A2) from the Sleep Heart Health Study (SHHS): (1) a baseline Convolutional Neural Network (CNN) trained on spectrograms, (2) a ResNet18 transfer learning model, and (3) a hybrid Vision Transformer–Bi Long Short-Term Memory (LSTM) (ViT–BiLSTM) model with cross-modal attention and self-supervised pretraining. EEG epochs (30 seconds, 0.5–40 Hz) were converted into short-time Fourier transform spectrograms, balanced into 40,000 samples, and split using a 70/15/15 stratified ratio. Model performance was evaluated using accuracy, precision, recall, F1-score, AUC, and confusion matrices. The ViT–BiLSTM model achieved the highest accuracy (≈99.0%), outperforming ResNet18 (≈98.68%) and the CNN baseline (≈92.68%), with superior discrimination of adjacent classes (e.g., mild vs. moderate OSA). Training and validation curves showed stable convergence without overfitting. Explainability analyses using saliency and SHAP maps revealed that models relied on physiologically meaningful EEG features, including delta–theta transitions and arousal-related high-frequency bursts. Overall, transformer-based spectral modeling combined with temporal sequence learning and self-supervised pretraining delivers the most accurate and interpretable results. While ResNet18 offers strong performance, the ViT–BiLSTM approach shows the greatest potential for scalable, clinically relevant EEG-based sleep apnea screening.
R et al. (Sun,) studied this question.