What is the clinical evidence from this study?

Study design: Other. Population: Cardiovascular disease (n=148). Intervention: Convolutional neural networks (U-Net, FCN, MultiResUNet) vs. Expert manual segmentation. Primary outcome: Correlation to expert segmentations for quantitative clinical parameters (Fisher-z-transformation rz') (rz' 0.978).

April 18, 2023Open Access

Multilevel comparison of deep learning models for function quantification in cardiovascular magnetic resonance: On the redundancy of architectural variations

Key Result

U-Net, FCN, and MultiResUNet deep learning models showed strong correlation (rz' ≥ 0.977) with expert segmentation for cardiac function quantification, indicating architectural modifications were not critical to performance.

Structured PICO

Do different CNN architectures (U-Net, FCN, MultiResUNet) differ in their accuracy for quantifying ventricular function on CMR compared to expert segmentation?

Population

148 patients from clinical routine undergoing cardiovascular magnetic resonance (CMR) for indications including coronary artery disease, cardiomyopathies, myocarditis, valvular heart disease, and cardiac mass.

Intervention

Three convolutional neural network (CNN) models (U-Net, FCN, MultiResUNet) trained for the automated segmentation of the left and right ventricles on short-axis cine images.

Comparator

Manual segmentations by a trained physician (expert).

Outcome

Segmentation accuracy evaluated on contour level and in terms of quantitative clinical parameters (LVEF, LVEDV, LVESV, LVM, RVEF, RVEDV, RVESV) and geometric segmentation metrics (Dice similarity coefficient, Hausdorff distance).surrogate

Modifications to CNN architectures (U-Net, FCN, MultiResUNet) do not significantly improve the quality of cardiac function quantification in CMR, as all models show similar strong correlations with expert segmentation but share common errors in basal and apical slices.

Main Result

Effect estimate: rz' 0.978

Limitations

Single-center and single-vendor dataset
Omission of extensive hyperparameter tuning in favor of unaltered reproduction of published architectures
Does not provide a fully comprehensive comparison of novel architecture variants

Abstract

Background Cardiac function quantification in cardiovascular magnetic resonance requires precise contouring of the heart chambers. This time-consuming task is increasingly being addressed by a plethora of ever more complex deep learning methods. However, only a small fraction of these have made their way from academia into clinical practice. In the quality assessment and control of medical artificial intelligence, the opaque reasoning and associated distinctive errors of neural networks meet an extraordinarily low tolerance for failure. Aim The aim of this study is a multilevel analysis and comparison of the performance of three popular convolutional neural network (CNN) models for cardiac function quantification. Methods U-Net, FCN, and MultiResUNet were trained for the segmentation of the left and right ventricles on short-axis cine images of 119 patients from clinical routine. The training pipeline and hyperparameters were kept constant to isolate the influence of network architecture. CNN performance was evaluated against expert segmentations for 29 test cases on contour level and in terms of quantitative clinical parameters. Multilevel analysis included breakdown of results by slice position, as well as visualization of segmentation deviations and linkage of volume differences to segmentation metrics via correlation plots for qualitative analysis. Results All models showed strong correlation to the expert with respect to quantitative clinical parameters ( r z ′ = 0.978, 0.977, 0.978 for U-Net, FCN, MultiResUNet respectively). The MultiResUNet significantly underestimated ventricular volumes and left ventricular myocardial mass. Segmentation difficulties and failures clustered in basal and apical slices for all CNNs, with the largest volume differences in the basal slices (mean absolute error per slice: 4.2 ± 4.5 ml for basal, 0.9 ± 1.3 ml for midventricular, 0.9 ± 0.9 ml for apical slices). Results for the right ventricle had higher variance and more outliers compared to the left ventricle. Intraclass correlation for clinical parameters was excellent (≥0.91) among the CNNs. Conclusion Modifications to CNN architecture were not critical to the quality of error for our dataset. Despite good overall agreement with the expert, errors accumulated in basal and apical slices for all models.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Clemens Ammann

University of Bern

Thomas Hadler

Max Delbrück Center

Jan Gröschel

Max Delbrück Center

Journals

Frontiers in Cardiovascular Medicine

Actions

Institutions

Charité - Universitätsmedizin Berlin

Humboldt-Universität zu Berlin

Freie Universität Berlin

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Ammann et al. (Tue,) conducted a other in Cardiovascular disease (n=148). Convolutional neural networks (U-Net, FCN, MultiResUNet) vs. Expert manual segmentation was evaluated on Correlation to expert segmentations for quantitative clinical parameters (Fisher-z-transformation rz') (rz' 0.978). U-Net, FCN, and MultiResUNet deep learning models showed strong correlation (rz' ≥ 0.977) with expert segmentation for cardiac function quantification, indicating architectural modifications were not critical to performance.

synapsesocial.com/papers/6a16c6607cba52b0f77b948f — DOI: https://doi.org/10.3389/fcvm.2023.1118499

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Introduction of Lazy Luna an automatic software-driven multilevel comparison of ventricular function quantification in cardiovascular magnetic resonance imaging· 2022 · 16 citations
Deep Learning–based Method for Fully Automatic Quantification of Left Ventricle Function from Cine MR Images: A Multivendor, Multicenter Study· 2018 · 207 citations
Fast acquisition of left and right ventricular function parameters applying cardiovascular magnetic resonance in clinical routine – validation of a 2-shot compressed sensing cine sequence· 2022 · 12 citations
Disentangle, Align and Fuse for Multimodal and Semi-Supervised Image Segmentation· 2020 · 63 citations
Untitled· 66 citations

Multilevel comparison of deep learning models for function quantification in cardiovascular magnetic resonance: On the redundancy of architectural variations

Key Result

Structured PICO

Main Result

Limitations

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider