PURPOSE: Endoscopy is critical in the identification of rectal tumors, but is prone to observer errors. The aim of this study was to assess the inter- and intra-observer variability in delineating rectal lesions in endoscopic images taken during high-dose-rate (HDR) brachytherapy and develop a deep learning-based automatic tumor segmentation model. MATERIALS AND METHODS: Three expert annotators identified tumors, scaring, ulcers and radiation proctitis in 801 endoscopic images from 24 patients. Inter-observer variability was evaluated at both whole-image and contour levels. Intra-observer variability was assessed by re-annotating 15 images from 14 patients after six months. Four DeepLabV3 models with a ResNet50 backbone were trained using a nested cross-validation approach: one per annotator and a fourth trained on majority-vote contours. Model performance was evaluated on 60 unseen images, which the annotators rated using a five-point Likert scale. RESULTS: Manual annotations showed significant variability for ulcers and radiation proctitis (average Dice: 0.36 and 0.57) versus tumors (0.83). Intra-observer Dice scores were 0.72, 0.68, and 0.87 across annotators. The majority-vote model outperformed individual annotator models (average Dice: 0.77) but generated many false positives, misclassifying ulcers and proctitis as tumors. Annotators generally rated the model trained on their own contours higher on the unseen test set. CONCLUSIONS: This work highlights the variability in expert annotations used as ground-truth for deep learning-based segmentation of rectal tumors in endoscopic images acquired during HDR brachytherapy. Automated contouring may provide a foundation for adaptive, AI-assisted brachytherapy workflows.
Thibodeau-Antonacci et al. (Mon,) studied this question.