Assessment of inflammatory bowel disease (IBD) activity is limited by the need for biopsy processing and manual histological review. Multimodal microscopy combining coherent anti-Stokes Raman scattering (CARS), two-photon excited autofluorescence (TPEF), and second-harmonic generation (SHG) provides label-free images of colonic tissue with subcellular resolution. Earlier analyses using classical machine learning required annotated masks and handcrafted features, introducing dependence on manual input and limiting scalability. In this work, nine convolutional neural network (CNN) architectures were evaluated for IBD classification under three training regimes: partial fine-tuning, full fine-tuning, and training from scratch. Models were trained with and without patch-level augmentation and assessed using patient-level cross-validation at both patch and patient levels and uncertainty quantified using bootstrap confidence intervals. ResNet architectures showed the most consistent performance, with ResNet50 providing the best balance between accuracy, stability, and parameter efficiency. Training from scratch often matched transfer-learning performance, indicating that ImageNet features do not always align well with multimodal microscopy data. DenseNet121, in particular, learned effectively from random initialization, highlighting the role of architectural connectivity in domain-specific learning. Partial fine-tuning with augmentation achieved near-perfect patient-level accuracy, while deeper ResNets offered no additional benefit. Lightweight models such as EfficientNetB0 and MobileNet depended on augmentation and complete retraining for stable convergence. Overall, these results show that architecture choice, adaptation capacity, and augmentation must be considered jointly. For practical transfer-learning setups in multimodal IBD histopathology, ResNet50 with partial fine-tuning and augmentation provides an efficient and robust baseline, while suitably structured architectures can still learn effectively from scratch on limited datasets.
Kamran et al. (Tue,) studied this question.