Objectives The Gastrointestinal (GI) Pathology subspecialty at our medical center is the highest-volume service in our Anatomic Pathology division, with colon polyps constituting 40% of specimens. 2021 revisions to colorectal cancer screening guidelines lowered the screening age to 45, which may increase endoscopy procedures and specimen volume. We therefore sought to develop an artificial intelligence (AI) classifier to triage colorectal polyp specimens. Methods We retrospectively searched our pathology database from 2021 to 2023 for colon polyps with following 12 final diagnoses: normal, lymphoid aggregate, inflammatory polyp, hyperplastic polyp, sessile serrated adenoma (with and without dysplasia), traditional serrated adenoma, tubular adenoma, tubulovillous adenoma, villous adenoma, high grade dysplasia (in any adenoma type), and invasive carcinoma. 1191 (759 neoplastic and 432 nonneoplastic) representative slides were scanned using a Leica Aperio LV1 Scanner at 40x magnification. Images were used to train a multi-scale cross attention multiple instance learning (MsCAMIL) network, a weakly-supervised transformer-based model to perform binary/triage (neoplastic versus nonneoplastic) and 12-way/final diagnosis classifications. Slides were randomly assigned for training (N=715, 60%), validation (N=119, 10%), and testing (N=357, 30%). An additional 40 slides collected from two subsequent clinical service days were included to represent routine clinical cases. In addition, 40 external slides from multiple outside academic and private practice institutions were scanned on both Leica Aperio GT450 LV1 and Leica Aperio ScanScope AT2 to compare performance. Results We assessed the following diagnostic performance metrics: the macro-averaged F1-Score (F1 = 2 x (precision x recall)/(precision + recall)), micro-averaged Accuracy (mACC), and macroaveraged specificity. In binary classification, the model achieved highest accuracy and F1-score at 95.18% and 97.42% in archived and routine clinical cases, respectively, with best discriminatory performance when overlapping patches of microscopic field from 10x and 20x magnifications. In 12-way classification, the F1 score was reduced to 74% and 57% in archived and routine cases, respectively. In the external set, the binary classification performance decreased to 86%, with misclassifications occurring primarily in slides from a single outside institution. Conclusion In daily clinical workflow, MsCAMIL networks can function as an efficient triage system by screening colon polypectomy specimens for neoplasm. In binary classification, the classifier showed >95% specificity and accuracy in both archived and routine clinical cases. Multi-institutional collaborations will be necessary to expand patient population and further validate this tool for clinical use.
Ikezogwo et al. (Mon,) studied this question.