Although R2*-MRI is extensively validated for assessing liver iron concentration (LIC), techniques for robust and automatic liver extraction remain underdeveloped, limiting clinical workflow and adoption for automated LIC reporting. Hence this study aims to develop a generalized U-Net model, utilizing limited labelled data, for automating whole liver parenchyma segmentation on magnitude images and R2* maps across various breath-hold and free-breathing MRI sequences to enhance the clinical workflow and accuracy of R2*-based LIC assessment. MRI images of twenty-nine patients with hepatic iron overload (22 females, median age:15 5–25 years) were obtained using multi-echo 2D and 3D Cartesian, and 3D radial GRE at 1.5T. A generalized 2D U-Net was trained on magnitude images for liver segmentation from the three sequences using incremental learning (Mag-U-Net). Transfer learning was applied for liver segmentation from R2* maps (R2*-U-Net). Segmentation accuracy was evaluated using Dice score. Frangi filter was used for blood vessel exclusion. Repeated measures ANOVA with Tukey’s post-hoc tests compared Dice scores of Mag-U-Net with individual sequence-specific U-Nets. Pairwise t-tests compared Dice scores of Mag-U-Net and R2*-U-Net against U-Nets trained without incremental and transfer learning. One-way ANOVA evaluated liver area, mean R2*, and fat fraction (FF) accuracy. Linear regression and Bland-Altman plots analyzed agreement between manual and automated R2* and FF estimations. Statistical significance: p < 0.05. Both models achieved high liver segmentation accuracy (mean Dice scores ≥ 0.91 for Mag-U-Net; ≥0.84 for R2*-U-Net), significantly outperforming baseline models trained without incremental or transfer learning (p ≤ 0.046). Mag-U-Net significantly surpassed individual sequence-specific U-Nets (p ≤ 0.036). Liver area, R2*/LIC, and FF estimates showed no significant differences for Mag-U-Net and R2*-U-Net compared to ground-truth segmentation (p ≥ 0.71). Strong correlations (R²≥0.97) were observed between ground-truth and generalized U-Net estimated mean R2* and FF, with mean biases (limits of agreement) ≤ 4.09% ( ≤ ± 8.11%) and ≤ 2.58% ( ≤ ± 12.12%), respectively. Generalized U-Net enables fully automated LIC assessment pipeline, improving workflow and facilitating broader adoption in clinical practice. Not applicable.
Shrestha et al. (Thu,) studied this question.