The evaluation of atrial fibrillation detector performance is highly dependent on the chosen comparison methodology (e.g., beat-to-beat vs. episode-to-episode) and signal characteristics, highlighting the need for standardized evaluation frameworks.
OBJECTIVE: A large number of atrial fibrillation (AF) detectors have been published in recent years, signifying that the comparison of detector performance plays a central role, though not always consistent. The aim of this study is to shed needed light on aspects crucial to the evaluation of detection performance. METHODS: Three types of AF detector, using either information on rhythm, rhythm and morphology, or segments of ECG samples, are implemented and studied on both real and simulated ECG signals. The properties of different performance measures are investigated, for example, in relation to dataset imbalance. RESULTS: The results show that performance can differ considerably depending on the way detector output is compared to database annotations, i.e., beat-to-beat, segment-to-segment, or episode-to-episode comparison. Moreover, depending on the type of detector, the results substantiate that physiological and technical factors, e.g., changes in ECG morphology, rate of atrial premature beats, and noise level, can have a considerable influence on performance. CONCLUSION: The present study demonstrates overall strengths and weaknesses of different types of detector, highlights challenges in AF detection, and proposes five recommendations on how to handle data and characterize performance.
Butkuvienė et al. (Mon,) studied this question.