Background: Current artificial intelligence (AI) systems in gastrointestinal (GI) endoscopy primarily emphasize binary detection or static classification, providing limited support for the graded assessment of malignant potential that underpins clinical decision-making. We developed GastroMalign, a transformer-based framework designed to stratify GI lesions according to ordinal disease severity while maintaining clinical interpretability, addressing this unmet need in endoscopic risk assessment. Methods: This retrospective development and validation study used the publicly available GastroVision dataset, comprising 8000 de-identified endoscopic still images from the upper and lower gastrointestinal tract, including the esophagus, stomach, duodenum, colon, rectum, and terminal ileum. GastroMalign integrates a Vision Transformer (ViT) encoder with a Sequential Feature Learner that explicitly models ordinal disease severity along a benign-to-malignant spectrum. The framework produces both categorical risk classification and a continuous malignancy risk score. Images were stratified into training (80%), validation (10%), and test (10%) sets. Performance was compared with convolutional neural network (CNN) baselines and a Swin Transformer. Interpretability was assessed using Score-CAM visualizations reviewed by blinded expert endoscopists. Results: On the held-out test set (n = 800 images), GastroMalign achieved an overall accuracy of 80.06%, precision of 79.65%, recall of 80.06%, and F1-score of 79.17%, with a micro-averaged AUC of 0.98. In comparison, ResNet-50 and DenseNet-121 achieved accuracies of 32.42% and 36.77%, respectively, while the Swin Transformer achieved 60.56% accuracy (AUC = 0.93). Ablation analyses demonstrated a 17% absolute reduction in High-Risk lesion recall when the progression-aware module was removed. Continuous malignancy risk scores increased monotonically across ordinal classes, with mean values 0.72 for High-Risk/Malignant lesions. Score-CAM visualizations demonstrated 92% overlap with clinician-annotated lesion regions. Conclusions: GastroMalign delivers an interpretable, progression-aware AI framework for GI lesion risk stratification that outperforms existing CNN- and transformer-based models. Clinically, GastroMalign is intended as an adjunct decision-support tool during endoscopic review to standardize lesion risk stratification (benign to malignant spectrum), support management decisions (biopsy vs. resection vs. surveillance), and reduce operator-dependent variability by pairing ordinal risk outputs with interpretable visual explanations.
Boppana et al. (Thu,) studied this question.