What question did this study set out to answer?

This research addresses the challenges of ECG heartbeat classification in arrhythmia detection and cardiac monitoring.

May 6, 2026Open Access

Robust and Calibrated ECG Heartbeat Classification via Hybrid Convolutional, Temporal and Attention-Based Learning

Key Points

This research addresses the challenges of ECG heartbeat classification in arrhythmia detection and cardiac monitoring.
Developed a hybrid deep learning architecture combining convolutional, temporal, and attention-based learning.
Assessed ECG heartbeat classification performance using calibration metrics like Brier Score.
Evaluated models on the ECG Heartbeat dataset under controlled conditions.
CNN model achieved 98.44% accuracy, showing strong performance on majority classes.
Hybrid CNN + BiGRU + Attention model attained 97.80% accuracy with a macro F1-score of 0.9052.
Results suggest improved training stability and good calibration behavior for hybrid models.

Abstract

Electrocardiogram (ECG) heartbeat classification is an essential component of automated arrhythmia detection and intelligent cardiac monitoring systems. Traditionally, ECG analysis has depended on manual interpretation by clinicians and conventional machine learning approaches based on handcrafted features, which are labor-intensive, noise-sensitive, and inadequate for capturing complex nonlinear morphological and temporal characteristics of ECG signals. Furthermore, real-world ECG datasets are highly imbalanced, noisy, and exhibit overlapping waveform patterns across heartbeat classes, leading to biased learning, poor minority class detection, and unreliable predictions. To address these challenges, this paper presents a calibration-aware, reliability-oriented evaluation framework for ECG heartbeat classification, incorporating hybrid deep learning architectures that combine convolutional feature extraction, bidirectional GRU-based temporal modeling, and attention mechanisms. The framework assesses probabilistic reliability using calibration metrics, such as the Brier Score and Expected Calibration Error (ECE), rather than explicitly modeling predictive uncertainty methods. Experimental results on the ECG Heartbeat dataset show that CNN achieves the highest testing accuracy (98.44%), largely due to strong performance on the majority class in an imbalanced setting. Among hybrid approaches, a representative hybrid CNN + BiGRU + Attention model attains a competitive accuracy of 97.80%, along with a higher macro F1-score (0.9052), improved training stability, and good calibration behavior (Brier Score = 0.0417, ECE = 0.1023). As the experiments are conducted on preprocessed, fixed-length segments, the results reflect performance under controlled conditions rather than real-world clinical deployment conditions and should therefore be interpreted as a benchmark-level evaluation. Furthermore, no single model consistently outperforms others across all evaluation criteria, as different metrics capture distinct aspects of performance.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper