The proposed modular deep-learning framework reduced RMSSD mean absolute error to 10.56 ms compared to 45.12 ms for HeartPy and 27.93 ms for NeuroKit2 on the combined ECG test set.
Does a modular deep-learning framework improve the accuracy and robustness of RMSSD estimation from ultra-short ECG windows compared to classical algorithmic baselines?
A novel modular deep-learning framework significantly improves the accuracy and robustness of real-time HRV (RMSSD) estimation from ultra-short ECG windows compared to standard algorithmic toolkits.
Absolute Event Rate: 10.56% vs 45.12%
p-value: p=< 10^-300
We present a universal modular deep-learning framework and demonstrate its application to low-latency, streaming-compatible heart rate variability (HRV) analysis using RMSSD as an exemplar metric. A convolutional autoencoder is first pretrained and then reused as a frozen encoder that maps raw ECG windows to a compact latent sequence. Task-specific heads, each comprising a BiLSTM adapter, a shallow Conv1D refinement, and temporal attention pooling operate on this shared representation. A discriminator head screens low-quality windows, while a regression head estimates RMSSD; a gated inference block routes outputs so RMSSD is produced only when the discriminator exceeds a threshold, replicating a robust “mask-then-estimate” pipeline in a single deployable graph. Using LUDB and PTB-XL with segmentation-assisted peak extraction for PTB-XL, plus an out-of-distribution Apple Watch subset, we enforce rigorous quality assurance to derive validity labels and RMSSD targets. Compared to two strong classical baselines (HeartPy and NeuroKit2), our discriminator improves combined-set accuracy to 92. 12% (vs. 80. 54% / 85. 58%) with F1 of 95. 43% (vs. 88. 82% / 91. 99%). On RMSSD estimation, the proposed model reduces combined MAE to 10. 56 ms (from 45. 12 ms / 27. 93 ms) and sharply curtails tail errors (P95: 47. 00 ms vs. 313. 35 ms / 167. 84 ms), indicating substantially improved robustness under pathological and noisy ECG. On a small out-of-distribution Apple Watch subset used as a sanity-check for acquisition shift, where the model attains the lowest MAE (7. 57 ms vs. 13. 96 ms / 9. 61 ms) under a selective gating regime. The end-to-end model is compact (2. 62 M parameters; 10. 07 MB on disk) and real-time capable, achieving 15. 0 ms mean latency at batch size 1 (66. 5 windows/s) and scaling to 4. 49k windows/s at batch size 1024 on a single consumer-grade GPU.
Dobrosolski et al. (Mon,) conducted a other in Heart rate variability (HRV) analysis (n=21,999). Modular deep-learning framework vs. HeartPy and NeuroKit2 was evaluated on RMSSD mean absolute error (MAE) on combined test set (p=< 10^-300). The proposed modular deep-learning framework reduced RMSSD mean absolute error to 10.56 ms compared to 45.12 ms for HeartPy and 27.93 ms for NeuroKit2 on the combined ECG test set.