Deep learning model AUROC for predicting echocardiographic abnormalities dropped from 0.78 (no arrhythmia) to 0.68 in AF and 0.62 in pacemaker rhythm, with 100% sensitivity but 0% specificity.
Does the presence of arrhythmias reduce the predictive accuracy of a deep learning model for echocardiographic abnormalities in patients undergoing ECG?
Arrhythmias, particularly atrial fibrillation and pacemaker rhythms, significantly degrade the specificity of deep learning models used to predict echocardiographic abnormalities from ECGs, limiting their clinical utility.
Tasa de eventos absoluta: 0% vs 0%
Abstract Background Heart failure requires early diagnosis, and targeted screening could improve if functional and morphological cardiac abnormalities were detectable on an electrocardiogram (ECG). While deep learning (DL) models have shown promise in predicting echocardiographic findings, their performance in patients with arrhythmias remains underexplored. Purpose This study aimed to evaluate the impact of arrhythmias, including atrial fibrillation (AF), premature ventricular contractions (PVC), and pacemaker rhythm (PM), on the ability of a DL-based model to predict echocardiographic abnormalities. By assessing how arrhythmias influence model accuracy, we sought to identify potential limitations and areas for improvement in artificial intelligence (AI)-assisted cardiac screening. Methods We utilized 229,439 paired ECG and echocardiography datasets from eight centers—six for model development and two for external validation. In previous analyses, external validation was conducted using data from both centers. In this study, we focused on a subset of one validation center that provided arrhythmia-labeled data, consisting of 29,411 ECG-echocardiography pairs. DL-based models were trained to predict 12 echocardiographic findings related to heart failure. Logistic regression was applied to generate a composite label, considered positive if any of the 12 findings were present. Model performance metrics, including area under the receiver-operating characteristic curve (AUROC), sensitivity, and specificity, were evaluated for AF, PVC, and PM subgroups individually, as well as for groups stratified by the presence or absence of any arrhythmia. Results The composite label achieved an AUROC of 0.78 in patients without arrhythmias, which decreased to 0.75 in those with any arrhythmia. Subgroup analyses revealed that this decline was primarily driven by AF and PM, with AUROCs of 0.68 and 0.62, respectively. In contrast, PVC showed a relatively smaller impact, with an AUROC of 0.80. Further examination of the confusion matrix revealed that in the presence of AF or PM, the model classified all cases as positive for the composite label, resulting in 100% sensitivity but zero specificity. This pattern suggests that the model struggles to distinguish between positive and negative cases in these subgroups, likely due to altered ECG signal patterns or limited training data. Conclusions Arrhythmias significantly impact the performance of DL-based models for predicting echocardiographic abnormalities. While sensitivity remains high, reduced specificity in AF and PM groups limits clinical utility and may lead to increased false positives. Further investigation is needed to determine whether these limitations stem from inadequate training data or intrinsic model weaknesses. This study highlights the need to address arrhythmia-related limitations to improve the robustness of DL-based screening tools in clinical practice.ROC curve: with and without arrhythmia
Fujiki et al. (Sat,) reported a other. Deep learning model AUROC for predicting echocardiographic abnormalities dropped from 0.78 (no arrhythmia) to 0.68 in AF and 0.62 in pacemaker rhythm, with 100% sensitivity but 0% specificity.