What question did this study set out to answer?

Evaluate the effectiveness of AI and conventional imaging in grading pancreatic neuroendocrine tumors, focusing on differentiating G2/G3 from G1 tumors.

April 18, 2026Open Access

Diagnostic performance of artificial intelligence versus conventional imaging for differentiating G2/G3 from G1 pancreatic neuroendocrine tumors: a systematic review and meta-analysis

Key Points

Evaluate the effectiveness of AI and conventional imaging in grading pancreatic neuroendocrine tumors, focusing on differentiating G2/G3 from G1 tumors.
Conducted a systematic review and meta-analysis of relevant studies
Included studies using CT, MRI, and endoscopic ultrasound for PanNET grading
Calculated pooled sensitivity, specificity, and AUC using a bivariate random-effects model
Performed subgroup analyses to explore heterogeneity and validation strategies
Overall pooled sensitivity was 0.77 and specificity was 0.83
AI models showed higher sensitivity (0.84) compared to conventional imaging (0.71)
AI models had lower specificity (0.78) than conventional imaging (0.87)
Multicenter studies had better diagnostic metrics than single-center cohorts
AI’s performance gap in validation narrowed with AI-specific subgroups

Abstract

Accurate preoperative grading of pancreatic neuroendocrine tumors (PanNETs), specifically differentiating G2/G3 from G1, is pivotal for treatment planning but challenging with conventional biopsy. This study aims to evaluate the diagnostic performance of imaging modalities, particularly comparing Machine Learning/Deep Learning (ML/DL) algorithms against conventional expert interpretation. We conducted a systematic review and meta-analysis of studies from PubMed, Embase, Web of Science, and Cochrane Library up to December 2025. Studies utilizing CT, MRI, or endoscopic ultrasound (EUS) for PanNET grading were included. A bivariate random-effects model was used to calculate pooled sensitivity, specificity, and area under the curve (AUC). Subgroup analyses were performed to investigate heterogeneity and the impact of validation strategies. Seventeen studies comprising 928 patients were included. The overall pooled sensitivity and specificity were 0.77 (95% CI: 0.71–0.83) and 0.83 (95% CI: 0.77–0.87), respectively, with an AUC of 0.87. Notably, ML/DL models demonstrated significantly higher sensitivity than conventional imaging (0.84 vs. 0.71, p < 0.01) but lower specificity (0.78 vs. 0.87, p < 0.01). Multicenter studies showed a trend toward higher diagnostic metrics compared to single-center cohorts. Interestingly, the performance gap between external and internal validation narrowed when restricted to AI subgroups, suggesting the robustness of modern algorithms. Imaging-based analysis offers high diagnostic accuracy for PanNET grading. A theoretical sequential diagnostic strategy is suggested: utilizing AI’s high sensitivity for initial screening, followed by expert radiological review to ensure specificity.

Bookmark

View Full Paper

Bookmark

View Full Paper

Diagnostic performance of artificial intelligence versus conventional imaging for differentiating G2/G3 from G1 pancreatic neuroendocrine tumors: a systematic review and meta-analysis

Key Points

Abstract

Cite This Study