Abstract INTRODUCTION Evaluation of mucosal injury via endoscopy is a primary biomarker for ulcerative colitis (UC) assessment. Endoscopic readouts serve as both screening criteria and endpoints for determining therapeutic efficacy in clinical trials. However, conventional measures rely on subjective interpretation and cannot fully capture the spatial detail contained in video evidence. We aimed to apply computational video analysis methods to enhance clinical trial readouts. METHODS Colonoscopy videos from prior clinical studies, with corresponding symptomatic data, study arm, and cohort assignments, were retrospectively analyzed. Each subject had at least two longitudinal timepoints, with every timepoint assigned a Mayo Endoscopic Score (MES) via the standard central review process. Videos were processed using a computational pipeline that generated two novel endpoints: the Cumulative Disease Score (CDS), representing spatially averaged disease burden, and the Continuous MES score (MESc), a continuous analogue of MES without its four-class limitation. We compared MESc to MES to identify significant disagreements. These cases underwent blinded independent review for potential MES revision. For cases where MES showed no apparent endoscopic change between timepoints, the CDS was applied to detect previously unrecognized improvement. RESULTS In one study (N = 99 total videos), the analytical readouts flagged 14 videos for secondary review. Blinded secondary review recommended MES adjustment for at least 8 of those videos. These changes were material in a study relying solely on MES as the endoscopic endpoint. In another study (N = 142 subjects), 61 subjects showed no endoscopic change by MES across two timepoints. The CDS revealed a statistically significant improvement in the treatment arm (p = 0.018), corresponding to symptomatic improvement (p = 0.007). No such improvement was observed in the placebo arm (p = 0.556), demonstrating the additional sensitivity of CDS to biologic efficacy. DISCUSSION Artificial intelligence enables standardized and enhanced assessment of endoscopic disease severity in clinical trials. Its objectivity, spatial granularity, and reproducibility provide clear advantages over traditional subjective evaluation by human central review. Our findings suggest that integrating computational readouts with human scoring offers a practical and readily implementable strategy to enhance the value of endoscopic video evidence in clinical decision-making and therapeutic development.
Ma et al. (Thu,) studied this question.