What is the clinical evidence from this study?

Study design: Observational. Population: Atrial Fibrillation (n=107). Intervention: DeepBeat vs. Single-task model. Primary outcome: Atrial fibrillation detection (RR 2.00, 95% CI 1.55-2.50, p=<0.001).

What does this research mean for the field?

DeepBeat achieved a sensitivity of 0.98 for atrial fibrillation detection compared to a sensitivity of 0.49 for the single-task model. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.CHALLENGES_CONSENSUS.

September 9, 2020Open Access

Multi-task deep learning for cardiac rhythm detection in wearable devices

Q: What are the key findings of this study?

DeepBeat achieved a sensitivity of 0.98 for atrial fibrillation detection compared to a sensitivity of 0.49 for the single-task model.

Q: What does this research mean for the field?

DeepBeat achieved a sensitivity of 0.98 for atrial fibrillation detection compared to a sensitivity of 0.49 for the single-task model. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.CHALLENGES_CONSENSUS.

Key Result

DeepBeat achieved a sensitivity of 0.98 for atrial fibrillation detection compared to a sensitivity of 0.49 for the single-task model.

Study Design

Type

Observational (n=107)

Multicenter

Structured PICO

Does a multitask deep learning model (DeepBeat) improve the detection of atrial fibrillation from wearable photoplethysmography devices compared to single-task models?

Population

Over 163 individuals (107 undergoing elective cardioversion, 41 undergoing exercise stress test, 15 ambulatory) plus a publicly available IEEE dataset, providing over 500,000 labeled 25-second photoplethysmography (PPG) signals. Cardioversion cohort: mean age 68, 85/22 M/F. Exercise stress test cohort: mean age 56, 26/14 M/F. Ambulatory cohort: mean age 67, 11/4 M/F.

Intervention

DeepBeat, a multitask deep learning method (convolutional neural network) using unsupervised transfer learning through convolutional denoising autoencoders (CDAE) to jointly assess signal quality and classify atrial fibrillation from 25-second wearable photoplethysmography (PPG) windows.

Comparator

Single-task learning models (predicting only AF without signal quality assessment), models without CDAE pretraining (random initialization), Random Forest, and a 1D VGG16 architecture.

Outcome

Atrial fibrillation detection performance measured by F1 score, sensitivity, specificity, false-positive rate, and false-negative rate on 25-second windowed physiological signals.

A multitask deep learning approach that jointly assesses signal quality and rhythm significantly improves the accuracy of atrial fibrillation detection from wearable PPG devices.

Main Result

Effect estimate: RR 2.00 (95% CI 1.55-2.50)

Absolute Event Rate: 0.98% vs 0.49%

p-value: p=<0.001

Limitations

The study only focused on atrial fibrillation as the abnormal rhythm.
Training data included a higher prevalence of arrhythmia than the general population.

Abstract

Abstract Wearable devices enable theoretically continuous, longitudinal monitoring of physiological measurements such as step count, energy expenditure, and heart rate. Although the classification of abnormal cardiac rhythms such as atrial fibrillation from wearable devices has great potential, commercial algorithms remain proprietary and tend to focus on heart rate variability derived from green spectrum LED sensors placed on the wrist, where noise remains an unsolved problem. Here we develop DeepBeat, a multitask deep learning method to jointly assess signal quality and arrhythmia event detection in wearable photoplethysmography devices for real-time detection of atrial fibrillation. The model is trained on approximately one million simulated unlabeled physiological signals and fine-tuned on a curated dataset of over 500 K labeled signals from over 100 individuals from 3 different wearable devices. We demonstrate that, in comparison with a single-task model, our architecture using unsupervised transfer learning through convolutional denoising autoencoders dramatically improves the performance of atrial fibrillation detection from a F1 score of 0.54 to 0.96. We also include in our evaluation a prospectively derived replication cohort of ambulatory participants where the algorithm performed with high sensitivity (0.98), specificity (0.99), and F1 score (0.93). We show that two-stage training can help address the unbalanced data problem common to biomedical applications, where large-scale well-annotated datasets are hard to generate due to the expense of manual annotation, data acquisition, and participant privacy.

Mark Helpful

Bookmark

Relay

View Full Paper