What question did this study set out to answer?

This work investigates whether multiple annotations per case are needed for effective training of uncertainty-aware image segmentation models.

May 1, 2026

Analysis of annotation requirements for training uncertainty-aware image segmentation models.

Key Points

This work investigates whether multiple annotations per case are needed for effective training of uncertainty-aware image segmentation models.
Utilized nine diverse publicly available datasets
Implemented the nnU-Net framework enhanced with various probabilistic models
Compared single versus multi-annotator annotations using multiple performance metrics including A-Dice and GED.
Ensemble-based probability map approaches outperformed other methods regardless of annotation type
Multi-annotation setting significantly improved inter-expert variability capture compared to single-annotation
Analytical evidence supports that multiple annotations may not be strictly necessary.

Abstract

Reliable uncertainty quantification is essential for deploying image segmentation models in systems where inter-rater variability among experts is significant and must be accounted for to ensure dependable performance. A key unresolved question is whether multiple annotations per case are required during training to obtain robust uncertainty estimates. In this work, we provide analytical and empirical evidence addressing this issue. Using nine diverse publicly available datasets and the nnU-Net framework extended with Ensemble, Bayesian, Probabilistic, and Hierarchical Probabilistic models, we systematically compare training with single- versus multi-annotator annotations. Uncertainty was assessed using two complementary approaches: probability maps and disagreement-as-class. Performance was measured with established metrics, including A-Dice, GED, and Dice. Results show that ensemble-based probability map approaches consistently outperform other methods and achieve comparable performance under both single- and multi-annotation settings. In contrast, for the disagreement-as-class approach multi-annotation setting provides significant advantages over a single-annotation setting for capturing inter-expert variability, particularly in uncertainty-class segmentation. The numerical findings are supported by the provided analytical arguments. These findings indicate that multiple annotations per case may not be strictly necessary for training effective uncertainty-aware segmentation models, offering practical implications for reducing annotation costs and enabling scalable development of reliable uncertainty-aware systems.

Mark Helpful

Bookmark

Relay

View Full Paper