Abstract Background Severity assessment in ulcerative colitis (UC), central to clinical trials and practice, relies on the Mayo Clinic Endoscopic Subscore (MCES). However, MCES is a composite score, and its interpretation is subjective. This study quantitatively characterizes the inter-rater reliability (IRR) of both MCES and other granular mucosal features derived from the Ulcerative Colitis Endoscopic Index of Severity (UCEIS)1, which may offer more prognostic value. We present results from a large-scale annotation campaign with five expert gastroenterologists. Methods We randomly sampled 1,200 quality-controlled frames from 80 unique endoscopy videos from 40 patients in the Phase III Etrolizumab trial2, using a sampling strategy stratified by anatomic location. Each frame was independently annotated by two experts (from a pool of five) for MCES and five other mucosal categories, with all labels detailed in Figure 1. IRR was quantified using Cohen’s Kappa (κ). Results IRR varied significantly (Figure 1). A key finding was the consistently high reliability for ‘normal’ or ‘absent’ labels across categories (e.g., MCES ‘0’ κ = 0.71, Bleeding ‘None’ κ = 0.70, Ulcers ‘None’ κ = 0.71), indicating that disagreement arises primarily from grading the severity of observed pathology. For MCES, intermediate labels ‘1’ (κ = 0.34) and ‘2’ (κ = 0.45) had low reliability. Similarly, Ulcers and Erosions showed low granular reliability (e.g., ‘Erosions’ κ = 0.45; ‘Superficial Ulcers’ κ = 0.28). Erythema showed the lowest overall reliability (κ = 0.38), with its ‘Mild’ label demonstrating near-random agreement (κ = 0.12). The intermediate ‘Patchy / Decreased’ vascular pattern (κ = 0.46) was also less reliable than its ‘Normal’ (κ = 0.65) and ‘Complete loss’ (κ = 0.71) counterparts. For Bleeding, the ‘Biopsy’ label was identified as a key confounder, as it was difficult to distinguish from other bleeding types in static frames. The very low κ for ‘Large’ pseudopolyps (κ=-0.01) is attributed to its low prevalence (1.2%). Conclusion Our findings quantify the significant subjectivity in interpreting key mucosal features. This reliability characterization is a critical first step toward developing more objective endoscopic scoring features. Addressing the least reliable labels, potentially by merging them to improve consistency at the cost of granularity, may be a necessary compromise. This work moves toward validating a set of reliable, objective endoscopic features with potential prognostic value for UC. References: 1Travis, Simon PL, et al. “Developing an instrument to assess the endoscopic severity of ulcerative colitis: the Ulcerative Colitis Endoscopic Index of Severity (UCEIS).” Gut 61.4 (2012): 535-542. 2Sandborn WJ, Vermeire S, Tyrrell H, et al. Etrolizumab for the Treatment of Ulcerative Colitis and Crohn’s Disease: An Overview of the Phase 3 Clinical Program. Adv Ther. 2020;37(7):3417-3431. doi:10.1007/s12325-020-01366-2 Conflict of interest: Blasco Fernandez, Pablo: Employee of F. Hoffmann-La Roche Ltd. Bossuyt, Peter: Grant support for research from AbbVie, EG Consulting fee from AbbVie, Bristol Meyers Squibb, CIRC, Galapagos, Janssen, Jeito capital, Lilly, Pentax, Pfizer, PSI-CRO, Roche, Takeda, Tetrameros Speakers fee from AbbVie, AMC ICP, Amgen, Bristol Myers Squibb, Celltrion, Dr Falk Benelux, EG, Galapagos, Globalport, Lilly, Medtalks, Materia Prima, Pentax, Springer Media Daperno, Marco: Personal Fees: Takeda, Johnson & Johnson, GILEAD, Roche, Pfizer, Abbvie, Ferring, Chiesi, Alfasigma, Celltrion, Sanofi Other: Clinical trials: Takeda, Janssen, Roche Kopylov, Uri: Grant: Takeda, Janssen,Abbvie, Medtronic, Ely Lilly Other: Takeda, Janssen,Ely Lilly, Roche, Celtrion, Abbvie, Medtronic, CTS, Pfizer, BMS- speaker and advisory fees Bouhnik, Yoram: Consultant for Abbvie, Alimentiv, Amgen, Biogen, Boehringer Ingelheim, Celltrion, Eli Lilly, Fresenius Kabi, Galapagos, Gilead, Hospira, Iterative Health, Janssen, Mayoli Spindler, Merck, MSD, Norgine, Pfizer, Roche, Sandoz, Sanofi, Takeda, UCB. Lectures from Abbvie, Celltrion, Fresenius Kabi, Galapagos, Gilead, Janssen, Lilly, MSD, Pfizer, Takeda. Grant support from Abbvie, Amgen, Fresenius Kabi, Janssen, Takeda, Viatris. Hospitalities from Abbvie, Alimentiv, Amgen, Biogen, Celltrion, Eli Lilly, Ferring, Fresenius Kabi, Galapagos, Janssen, Mayoli Spindler, MSD, Nordic Pharma, Pfizer, Sandoz, Sanofi, Takeda, Viatris. Karmiris, Konstantinos: Personal Fees: Speaker fees from Abbvie, BMS, Eli-Lilly, Genesis, Innovis, Johnson & Johnson, Pfizer and consultancy or advisory board member fees from Abbvie, BMS, Faran, Ferring, Genesis, Johnson & Johnson, Pfizer, Roche and Takeda Benmansour, Fethallah: Employee of Hoffmann-La Roche Ltd Gutierrez Becker, Benjamin: Employee of Hoffmann-La Roche Ltd Fraessle, Stefan: Employee of Hoffmann-La Roche Ltd Stimpel, Bernhard: Employee of Hoffmann-La Roche Ltd Levitte, Steven: Employee at Genentech Inc. Gomariz, Alvaro: Employee of F. Hoffmann-La Roche Ltd.
Fernandez et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: