Digital pathology enables large multi-center studies of histological specimens, but differences in staining protocols and slide quality can compromise the comparability of quantitative results. We analyzed 686 PicroSirius Red-stained liver biopsies from 4 independent cohorts spanning more than 20 clinical sites to assess how stain variability affects automated fibrosis quantification and model uncertainty. An U-Net ensemble was trained to segment collagen and to estimate pixel- and tile-level predictive uncertainty. Across markedly heterogeneous staining conditions, the ensemble achieved strong segmentation performance (Dice 0.83–0.90) and produced informative uncertainty maps that identified artifacts and out-of-distribution regions. Epistemic uncertainty values were typically below 0.002, providing a practical criterion for flagging unreliable predictions. Our results demonstrate that ensemble-based uncertainty estimation complements stain-standardization efforts by quantifying prediction confidence directly from model outputs, improving the reliability and interpretability of collagen proportionate-area measurements across multi-center datasets. This framework supports more trustworthy and reproducible digital-pathology workflows for fibrosis assessment and other histological applications. • A retrospective cohort of liver biopsies collected from over 20 healthcare centers has been assembled. • The cohort is characterized on the basis of collagen staining used for liver fibrosis assessment. • A computational pipeline for the quantification of collagen from liver histology slides has been developed and applied to the described cohorts. • Uncertainty estimation is evaluated as a method to build trust in deep-learning based collagen predictions.
Wojciechowska et al. (Sun,) studied this question.