Three AI models for automated CMR segmentation showed strong agreement with expert measurements (r > 0.8), but produced clinically relevant differences across cardiac regions and disease groups.
Observational (n=346)
346 cases including dilated cardiomyopathy, left ventricular hypertrophy, healthy volunteers, and other cardiac diseases.
Three AI models (two commercial, one research) for automated post-processing of short-axis cine images in cardiovascular magnetic resonance imaging
Expert-derived measurements
Clinical parameter agreement between AI-derived and expert-derived ventricular volumes and left ventricular mass (LVM) evaluated using correlations and mean differencessurrogate
While AI solutions for CMR segmentation show high overall agreement with experts, they are not interchangeable and can produce clinically relevant differences depending on the cardiac region and disease.
Effect estimate: r > 0.8
Abstract Automated segmentation of cardiac magnetic resonance (CMR) imaging is integrated into clinical workflows, yet comparative performance across vendor AI solutions remains insufficiently characterized. This study assessed three models (two commercial, one research) for short-axis cine segmentation in a diverse cohort of 346 cases, including dilated cardiomyopathy (DCM), left ventricular hypertrophy (LVH), healthy volunteers, and other cardiac diseases. Clinical parameter agreement between AI-derived and expert-derived ventricular volumes and left ventricular mass (LVM) was evaluated using correlations and mean differences, segmentation agreement with Dice coefficient, and slice detection was characterized with false positive and negative rates (FPR/FNR). Papillary muscle (PM) inclusion was examined with subgroup analyses. AI-derived clinical parameters agreed strongly with expert measurements (r > 0.8). Nevertheless, inter-model biases included differing ventricular volume estimates. Midventricular segmentation was reliable (Dice > 80%), whereas apical slices were poor (Dice < 65%) with minor area impact (< 1cm 2 ). Basal slice detection varied substantially, with AI1 and AI2 over- and AI3 under-detecting slices (e.g. RV FPR: AI1 24%, AI2 14%, AI3 FNR: 32%), producing large area differences. Due to PM exclusion AI2 overestimated volumes and underestimated LVM – particularly LVH-cases. While AI-expert agreement is high, AI solutions are not interchangeable and produce clinically relevant differences to experts across cardiac regions and disease groups.
Building similarity graph...
Analyzing shared references across papers
Loading...
Thomas Hadler
Max Delbrück Center
Clemens Ammann
University of Bern
Hadil Saad
Max Delbrück Center
Scientific Reports
Humboldt-Universität zu Berlin
Max Delbrück Center
Siemens (Germany)
Building similarity graph...
Analyzing shared references across papers
Loading...
Hadler et al. (Tue,) conducted a observational in Dilated cardiomyopathy, left ventricular hypertrophy, healthy volunteers, and other cardiac diseases (n=346). Three AI models (two commercial, one research) for short-axis cine segmentation vs. Expert-derived measurements was evaluated on Clinical parameter agreement between AI-derived and expert-derived ventricular volumes and left ventricular mass (r > 0.8). Three AI models for automated CMR segmentation showed strong agreement with expert measurements (r > 0.8), but produced clinically relevant differences across cardiac regions and disease groups.
synapsesocial.com/papers/6a211781d499ed480b170643 — DOI: https://doi.org/10.1038/s41598-026-54182-z