March 22, 2021Open Access

Considerations on Performance Evaluation of Atrial Fibrillation Detectors

Structured PICO

Population

Real and simulated ECG signals with paroxysmal atrial fibrillation from the Saint Petersburg Atrial Fibrillation Database (SPAFDB, n=36 recordings), MIT-BIH Atrial Fibrillation Database (AFDB, n=23 recordings), and Long-Term AF Database (LTAFDB, n=84 recordings).

Intervention

Three types of atrial fibrillation detectors: rhythm-based, rhythm- and morphology-based, and deep learning (DL)-based (1D convolutional neural network).

Outcome

Detector performance (Accuracy, Sensitivity, Specificity, F1 score, Matthews correlation coefficient) evaluated via beat-to-beat, segment-to-segment, or episode-to-episode comparison.

The evaluation of atrial fibrillation detector performance is highly dependent on the chosen comparison methodology (e.g., beat-to-beat vs. episode-to-episode) and signal characteristics, highlighting the need for standardized evaluation frameworks.

Abstract

OBJECTIVE: A large number of atrial fibrillation (AF) detectors have been published in recent years, signifying that the comparison of detector performance plays a central role, though not always consistent. The aim of this study is to shed needed light on aspects crucial to the evaluation of detection performance. METHODS: Three types of AF detector, using either information on rhythm, rhythm and morphology, or segments of ECG samples, are implemented and studied on both real and simulated ECG signals. The properties of different performance measures are investigated, for example, in relation to dataset imbalance. RESULTS: The results show that performance can differ considerably depending on the way detector output is compared to database annotations, i.e., beat-to-beat, segment-to-segment, or episode-to-episode comparison. Moreover, depending on the type of detector, the results substantiate that physiological and technical factors, e.g., changes in ECG morphology, rate of atrial premature beats, and noise level, can have a considerable influence on performance. CONCLUSION: The present study demonstrates overall strengths and weaknesses of different types of detector, highlights challenges in AF detection, and proposes five recommendations on how to handle data and characterize performance.

Bookmark

View Full Paper

Cite This Study

Butkuvienė et al. (Mon,) studied this question.

synapsesocial.com/papers/6a212960ffa0738687c3ccb6 https://doi.org/https://doi.org/10.1109/tbme.2021.3067698

Bookmark

View Full Paper