Abstract Background Patients with Ulcerative Colitis (UC) face an elevated risk of colorectal cancer (CRC), which typically progresses through an inflammation-dysplasia-carcinoma sequence. Early identification and risk stratification of dysplasia are critical for clinical decision-making. However, existing prediction models often focus on progression from established low-grade dysplasia (LGD) and rely on single-modality data with traditional linear models. This study aimed to develop and validate a multimodal machine learning (ML) model to predict the risk of dysplasia in UC patients. Methods This retrospective study included 478 UC patients (356 inflammatory, 122 LGD) from the Jinling Hospital (October 2014 - September 2024). The cohort was divided into a training set (n = 336) and a validation set (n = 142). We collected data including: 1) clinical history, 2) laboratory indicators (e.g., ESR, CRP), 3) endoscopic features (UCEIS score), and 4) pathological features (Nancy index). Missing data were handled using MICE imputation. LASSO (Least Absolute Shrinkage and Selection Operator) regression was used for feature selection. Ten distinct ML models (including Logistic Regression, SVM, Random Forest, Xgboost, and LightGBM) were trained and optimized using 10-fold 5-repeat cross-validation. Model performance was assessed by the Area Under the Receiver Operating Characteristic Curve (AUC), calibration plots, and Decision Curve Analysis (DCA). The best-performing model was interpreted using SHAP (SHapley Additive exPlanations). Results LASSO regression selected 10 predictive features, including Age at onset, Histological activity, Disease duration, ESR, Montreal Classification, Stenosis, Polyp, and UCEIS scores. In the independent validation set, the LightGBM model achieved the strongest performance with an AUC of 0.798 (95% CI: 0.721-0.876), an accuracy of 0.754, and an F1-score of 0.607. This model outperformed other models, including traditional Logistic Regression (AUC 0.788). DCA confirmed the LightGBM model had the highest clinical net benefit across most risk thresholds in the training data and remained competitive in the validation set. SHAP analysis identified Age at onset, Histological activity, Disease duration, and ESR as the most significant predictors of dysplasia risk. Conclusion A machine learning model, particularly LightGBM, that integrates multimodal data (clinical, laboratory, endoscopic, and pathological) can effectively predict dysplasia risk in UC patients. This tool may assist clinicians in identifying high-risk individuals, thereby optimizing surveillance strategies and facilitating early intervention. Conflict of interest: Mr. Wang, Hongqin: No conflict of interest Wang, Fangyu: No conflict of interest Wei, Juan: No conflict of interest
Building similarity graph...
Analyzing shared references across papers
Loading...
H Wang
F Wang
J Wei
Journal of Crohn s and Colitis
Southeast University
Nanjing Medical University
Second Affiliated Hospital of Nanjing Medical University
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Thu,) studied this question.
www.synapsesocial.com/papers/697310b0c8125b09b0d20577 — DOI: https://doi.org/10.1093/ecco-jcc/jjaf231.720