Crohn's disease (CD) is a chronic inflammatory condition affecting the gastrointestinal tract, and displays a growing prevalence in children. In affected children, it displays more heterogeneous disease trajectories and treatment responses than adult-onset cases, posing significant management challenges. While early aggressive treatment may benefit patients with severe trajectories, no objective method exists to identify the high-risk children at diagnosis. This prognostic gap forces reliance on subjective clinical judgment, potentially delaying critical interventions. This study aimed to use machine learning models to predict two first-year outcomes in the Canadian Children IBD Network inception cohort: 1) sustained remission vs non-sustained remission, defined as maintaining a post-remission Weighted Pediatric Crohn's Disease Activity Index (wPCDAI) <12.5 without inflammatory episodes, and 2) maximal disease severity (remission/mild post-diagnosis wPCDAI <40, indicating minimal inflammatory activity vs moderate/severe wPCDAI ≥40, indicating substantial inflammation and need for treatment escalation). Nine algorithms were trained on baseline clinical, microbiome, and integrated clinical-microbiome datasets using repeated nested 3-fold cross-validation, with the minimum redundancy maximal relevance feature selection, Bayesian hyperparameter optimization, and SHAP for model explainability. For sustained remission prediction, integrated models outperformed microbiome- or clinical-only models, with integrated logistic regression achieving the highest mean AUC (0.763); key features included initial treatment at diagnosis, disease location, and wPCDAI at diagnosis, as well as taxa known to play a role in CD such as Haemophilus and Lachnospiraceae. For maximal disease severity prediction, microbiome models performed best, with Gaussian naïve Bayes reaching a mean AUC of 0.801 and highlighting microbes such as Clostridium and Veillonella as predictors of severe disease, while taxa such as Coprococcus and Romboutsia were associated with milder disease. Bayesian decision curve analysis of our top-performing models also demonstrated likely clinical utility at relevant decision thresholds. Our results suggest the potential of integrated machine learning approaches to support clinical decision-making in pediatric Crohn's disease. By enabling early identification of high-risk patients at diagnosis, this work paves the way for personalized treatment strategies that could improve long-term outcomes in this vulnerable population.
Irvin Ng (Thu,) studied this question.