Accurate preoperative grading of pancreatic neuroendocrine tumors (PanNETs), specifically differentiating G2/G3 from G1, is pivotal for treatment planning but challenging with conventional biopsy. This study aims to evaluate the diagnostic performance of imaging modalities, particularly comparing Machine Learning/Deep Learning (ML/DL) algorithms against conventional expert interpretation. We conducted a systematic review and meta-analysis of studies from PubMed, Embase, Web of Science, and Cochrane Library up to December 2025. Studies utilizing CT, MRI, or endoscopic ultrasound (EUS) for PanNET grading were included. A bivariate random-effects model was used to calculate pooled sensitivity, specificity, and area under the curve (AUC). Subgroup analyses were performed to investigate heterogeneity and the impact of validation strategies. Seventeen studies comprising 928 patients were included. The overall pooled sensitivity and specificity were 0.77 (95% CI: 0.71–0.83) and 0.83 (95% CI: 0.77–0.87), respectively, with an AUC of 0.87. Notably, ML/DL models demonstrated significantly higher sensitivity than conventional imaging (0.84 vs. 0.71, p < 0.01) but lower specificity (0.78 vs. 0.87, p < 0.01). Multicenter studies showed a trend toward higher diagnostic metrics compared to single-center cohorts. Interestingly, the performance gap between external and internal validation narrowed when restricted to AI subgroups, suggesting the robustness of modern algorithms. Imaging-based analysis offers high diagnostic accuracy for PanNET grading. A theoretical sequential diagnostic strategy is suggested: utilizing AI’s high sensitivity for initial screening, followed by expert radiological review to ensure specificity.
Zhang et al. (Thu,) studied this question.