What question did this study set out to answer?

This research aims to evaluate how biparametric MRI quality affects the accuracy of a deep learning model and radiologists in detecting Grade Group 2 prostate cancer.

March 14, 2026Open Access

In Prostate Cancer Diagnosis, a Deep Learning Model Shows Increased Susceptibility to MRI Quality Variations Compared to Radiologists

Key Points

This research aims to evaluate how biparametric MRI quality affects the accuracy of a deep learning model and radiologists in detecting Grade Group 2 prostate cancer.
Utilized a deep learning model (annU-Net) for segmentation of Grade Group 2 prostate cancer.
Involved a total of 3073 bpMRI scans; 1500 for training, 1000 for testing, and 573 for external validation.
Assessed MRI quality using PI-QUAL scores and analyzed performance through AUROC for both the DL model and radiologists.
Performed bootstrap testing for confidence interval calculations and significance determination.
Radiologists achieved AUCR1 of 0.90 and AUCR2 of 0.80 on the external dataset.
The DL model showed a notable decline to AUCDL-PIQUAL1 of 0.63 on lower-quality scans, but improved to AUCDL-PIQUAL2-3 of 0.71 on high-quality scans (p<0.05).
Radiologist performance remained stable across scan qualities, showing no significant differences (p=0.35 and p=0.49 for AUCR1 and AUCR2 comparisons).

Abstract

Purpose: To assess the impact of biparametric MRI (bpMRI) scan quality, as determined by PI-QUAL scores, on the diagnostic performance of a deep learning (DL) model and radiologists in correctly detecting Grade Group (GG)2 prostate cancer in prostate cancer suspected men.J o u r n a l P r e -p r o o f Materials and Methods: A nnU-Net GG2 cancer segmentation model used 1500 bpMRI scans for training, 1000 for testing (PI-CAI cohort), and 573 scans for external validation (PROMIS cohort).The external cohort analysis included MRI assessment by a radiologist (R1) using PI-RADS v2.1 and the original PROMIS study Likert scores as the second assessment (R2).Two readers (QR1, QR2) independently assessed the image quality of the PROMIS MRI scans and determined a consensus PI-QUAL (v2) score.The reference standard was GG2 cancer, confirmed by transperineal saturation biopsy.The model's and radiologists' diagnostic performance (AUCs) were compared.Bootstrap testing was used to calculate 95% Confidence Intervals (CI) and determine significance of the performance differences between quality subgroups.Results: On the external dataset, readers' performance achieved AUCR1 0.90 0.87-0.92and AUCR2 0.80 0.76-0.83,respectively.The DL model's performance on reduced-quality scans (n=141) declined (AUCDL-PIQUAL1 0.63 (0.53-0.71), whereas on high-quality scans (n=432), performance increased (AUCDL-PIQUAL2-3 0.71 0.66-0.75(p<0.05)).In contrast, the readers' performance did not differ significantly across scan qualities (AUCR1-PIQUAL1 0.88 0.82-0.93and AUCR1-PIQUAL2-3 0.91 0.88-0.93(p=0.35);AUCR2-PIQUAL1 0.79 0.71-0.86and AUCR2-PIQUAL2-3 0.80 0.76-0.84(p=0.49).Conclusions: The diagnostic performance of the DL model declined significantly on reduced-quality MRI scans as determined by PI-QUAL scoring, whereas radiologists demonstrated greater robustness to degraded conditions.

AI에게 질문

Bookmark

View Full Paper