What question did this study set out to answer?

This research aims to assess the performance of automated tumor segmentation and the value of uncertainty estimates in diffuse midline glioma cases.

June 26, 2026Open Access

ID #995 Automated Segmentation Performance and Uncertainty in Pediatric Diffuse Midline Gliomas using Imaging Biomarkers

Key Points

This research aims to assess the performance of automated tumor segmentation and the value of uncertainty estimates in diffuse midline glioma cases.
Segmented whole tumor using multi-contrast MRIs from an international cohort of DMG patients (n=403)
Classified segmentation performance into acceptable and poor based on Dice score
Investigated human segmentor contour uncertainty with eye-tracking among 36 annotators across 12 slices.
Median Dice score was 0.77-0.81 indicating good performance
Automated segmentations altered 20% of manual response labels primarily misclassifying diseases due to undersegmentation
Combined model integrating attention from both deep learning and human gaze explained 39% of uncertainty variance.

Abstract

Abstract Background MRI-based tumor segmentation could greatly support clinical assessment of diffuse midline glioma (DMG), yet translation of automated methods remains constrained by occasional model failures, as the performance required for clinical utility and the value of uncertainty estimates in detecting meaningful errors remain unclear. We systematically evaluate segmentation performance prediction, response label stability, and uncertainty estimation. Methods Whole tumor was segmented in a multicentric, international cohort of pre- and post-therapy multi-contrast MRIs (n = 403) of 107 DMG patients. Segmentations by a state-of-the-art deep learning model were dichotomized by Dice score into acceptable (Dice0.8) and poor (Dice0.8). We analyzed segmentation performance classification from image-derived features (imaging metadata, radiomic features, 3D brain MRI foundation model embeddings), and response assessments stemming from manual vs. automated segmentations (n = 51 patients with longitudinal follow-up). Using eyetracking, in a sub-study, we further quantified human segmentor (36 annotators) contour uncertainty (12 slices) contextualized with observer gaze patterns. Results Despite generally good performance (median Dice=0.77-0.81), auto-segmented volumes altered 20% of trajectory-based manual response labels (n = 10), predominantly misclassifying stable/progressive disease as partial response due to undersegmentation of post-treatment scans. Segmentation performance was best classified using a combination of whole image foundation model embeddings and segmented tumor volume (ROCAUC=0.81±0.05). Segmentation error correlated (|r|=0.9) with human contour uncertainty, supporting model-based uncertainty as a proxy for annotation difficulty. Image-derived attention features from deeper encoder layers explained substantially more uncertainty variance than eye-tracking features alone (R²: 24% vs. 2%). Human gaze attention overlapped most with U-Net bottleneck activations (Dice=0.6). A combined model integrating model attention and human visual behavior explained 39% of uncertainty variance. Conclusions Jointly, these results support the integration of performance- and uncertainty-aware segmentation frameworks to enable safe clinical deployment, scalable quality assurance, and reliable endpoint extraction from automated tumor segmentations in DMG.

ID #995 Automated Segmentation Performance and Uncertainty in Pediatric Diffuse Midline Gliomas using Imaging Biomarkers

Key Points

Abstract

Cite This Study