What question did this study set out to answer?

This study aims to validate an automated ECG-based sleep staging algorithm against manual scoring by sleep technicians in a clinical population.

May 10, 2026

1116 Validating an Automated ECG-Based Sleep Staging Against Expert Scoring in a Veteran Population with Suspected Sleep Disorders

Q: What is the clinical evidence from this study?

Study design: Observational. Population: Suspected sleep disorders (n=30). Intervention: Automated AI-based ECG-based sleep staging algorithm vs. Manual scoring by clinical sleep technicians. Primary outcome: Overall percent agreement (OPA) and Cohen's kappa between automated and manual scoring (Cohen's kappa 0.67, 95% CI 0.62-0.71).

Key Result

An automated AI-based ECG sleep staging algorithm demonstrated substantial agreement with manual scoring, achieving an overall percent agreement of 77.13% and a Cohen's kappa of 0.67.

Key Points

This study aims to validate an automated ECG-based sleep staging algorithm against manual scoring by sleep technicians in a clinical population.
Analyzed data from 30 participants at the VA Greater Los Angeles Healthcare System.
Used an AI-based ECG algorithm for 30-second epoch classifications of sleep stages: Wake, N1, N2, N3, and REM.
Evaluated agreement using overall percent agreement, Cohen’s kappa, and calculated stage-specific metrics such as PPA and NPA.
Overall percent agreement between the ECG algorithm and human scoring was 77.13% (95% CI: 73.81%-79.95%).
Cohen’s kappa value was 0.67 (95% CI: 0.62-0.71), indicating substantial agreement.
High PPA was found for Wake (91.00%), N2 (83.49%), and REM (81.40%), while N3 (51.39%) and N1 (27.13%) had lower agreement.

Study Design

Type

Observational (n=30)

Multicenter

Structured PICO

Does an automated AI-based ECG sleep staging algorithm accurately agree with manual scoring in patients with suspected sleep disorders?

Population

30 participants enrolled at the VA Greater Los Angeles Healthcare System for evaluation of suspected sleep disorders

Intervention

Automated AI-based ECG-based sleep staging algorithm generating 30-second epoch classifications using lead-II ECG features

Comparator

Manual scoring by clinical sleep technicians

Outcome

Agreement between automated and manual scoring evaluated using overall percent agreement (OPA) and Cohen’s kappasurrogate

An automated AI-based ECG sleep staging algorithm showed substantial agreement with manual scoring, highlighting its potential as a scalable tool for sleep architecture assessment.

Main Result

Effect estimate: Cohen's kappa 0.67 (95% CI 0.62-0.71)

Limitations

N3 detection requires further refinement

Abstract

Abstract Introduction AI-based ECG-only sleep staging could be essential for diagnosing sleep disorders and expanding access to automated phenotyping. This would enable versatile analysis of polysomnography recordings, including applications in cardiometabolic and mental health. Cardiorespiratory coupling suggests that sleep stages can be inferred through autonomic changes. This study assesses the consistency between an automated AI-based ECG-based sleep staging algorithm and manual scoring by sleep technicians in a clinical population. Methods We analyzed data from the first 30 participants enrolled at the VA Greater Los Angeles Healthcare System for evaluation of suspected sleep disorders. The ECG-based algorithm generated 30-second epoch classifications for Wake, N1, N2, N3, and REM using lead-II ECG features. Human scoring performed by clinical sleep technicians served as the reference standard. Agreement between automated and manual scoring was evaluated using overall percent agreement (OPA) and Cohen’s kappa. For each sleep stage, we additionally calculated positive percent agreement (PPA), negative percent agreement (NPA), and positive predictive value (PPV). To obtain point estimates and 95% confidence intervals, we applied a bootstrap procedure with 10,000 resamples at the EDF-file level, computing all metrics (OPA, Cohen’s kappa, PPA, NPA, and PPV) using this resampling framework. Results A total of 24,949 epochs were analyzed. The OPA between the ECG-based algorithm and human scorers was 77.13% (95% CI: 73.81%-79.95%), with a Cohen’s kappa of 0.67 (95% CI: 0.62-0.71), indicating substantial agreement. Stage-specific results for ECG-based staging showed high PPA for Wake (91.00%), N2 (83.49%), and REM (81.40%), moderate PPA for N3 (51.39%), and low PPA for N1 (27.13%). NPA values were high across stages, and PPV was strongest for Wake and REM. Conclusion The automated AI-based ECG staging demonstrated robust agreement with traditional manual EEG scoring, particularly for accurately identifying Wake, N2, and REM sleep. While N3 detection requires further refinement, these results highlight the strong potential of ECG as a simple, scalable, and standalone tool for assessing sleep architecture in clinical settings. Support (if any) Medibio Limited.

Bookmark

Cite This Study

Grassi et al. (Fri,) conducted a observational in Suspected sleep disorders (n=30). Automated AI-based ECG-based sleep staging algorithm vs. Manual scoring by clinical sleep technicians was evaluated on Overall percent agreement (OPA) and Cohen's kappa between automated and manual scoring (Cohen's kappa 0.67, 95% CI 0.62-0.71). An automated AI-based ECG sleep staging algorithm demonstrated substantial agreement with manual scoring, achieving an overall percent agreement of 77.13% and a Cohen's kappa of 0.67.

synapsesocial.com/papers/6a0021b7c8f74e3340f9ca4a https://doi.org/https://doi.org/10.1093/sleep/zsag091.1115

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: