What question did this study set out to answer?

The aim is to classify the severity of obstructive sleep apnea using single-channel EEG data through advanced deep learning models.

March 13, 2026Open Access

Hybrid Vision Transformer–BiLSTM for EEG-Based Sleep Apnea Severity Classification

Key Points

The aim is to classify the severity of obstructive sleep apnea using single-channel EEG data through advanced deep learning models.
Converted EEG data into short-time Fourier transform spectrograms.
Trained three deep learning models: CNN, ResNet18, and ViT-BiLSTM.
Evaluated model performance using accuracy, precision, recall, F1-score, and AUC metrics.
Balanced and stratified EEG samples into training and validation sets.
The ViT-BiLSTM model achieved the highest classification accuracy of approximately 99.0%.
ResNet18 had an accuracy of about 98.68%, while the CNN baseline was approximately 92.68%.
ViT-BiLSTM model showed superior ability to distinguish adjacent classes, like mild and moderate OSA.
Explainability analyses indicated reliance on meaningful EEG features, such as delta–theta transitions.

Abstract

Polysomnography (PSG) is the gold standard for diagnosing obstructive sleep apnea (OSA), but its cost and complexity limit widespread screening. EEG provides rich temporal–spectral data that deep learning can leverage for automated OSA classification. This study compares three deep learning approaches for four-class OSA severity classification (healthy, mild, moderate, severe) using single-channel EEG (C3-A2) from the Sleep Heart Health Study (SHHS): (1) a baseline Convolutional Neural Network (CNN) trained on spectrograms, (2) a ResNet18 transfer learning model, and (3) a hybrid Vision Transformer–Bi Long Short-Term Memory (LSTM) (ViT–BiLSTM) model with cross-modal attention and self-supervised pretraining. EEG epochs (30 seconds, 0.5–40 Hz) were converted into short-time Fourier transform spectrograms, balanced into 40,000 samples, and split using a 70/15/15 stratified ratio. Model performance was evaluated using accuracy, precision, recall, F1-score, AUC, and confusion matrices. The ViT–BiLSTM model achieved the highest accuracy (≈99.0%), outperforming ResNet18 (≈98.68%) and the CNN baseline (≈92.68%), with superior discrimination of adjacent classes (e.g., mild vs. moderate OSA). Training and validation curves showed stable convergence without overfitting. Explainability analyses using saliency and SHAP maps revealed that models relied on physiologically meaningful EEG features, including delta–theta transitions and arousal-related high-frequency bursts. Overall, transformer-based spectral modeling combined with temporal sequence learning and self-supervised pretraining delivers the most accurate and interpretable results. While ResNet18 offers strong performance, the ViT–BiLSTM approach shows the greatest potential for scalable, clinically relevant EEG-based sleep apnea screening.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

R et al. (Sun,) studied this question.

synapsesocial.com/papers/69b3acf302a1e69014ccf1fc https://doi.org/https://doi.org/10.1016/j.sleepe.2026.100137

Bookmark

View Full Paper