Purpose: To assess the impact of biparametric MRI (bpMRI) scan quality, as determined by PI-QUAL scores, on the diagnostic performance of a deep learning (DL) model and radiologists in correctly detecting Grade Group (GG)2 prostate cancer in prostate cancer suspected men.J o u r n a l P r e -p r o o f Materials and Methods: A nnU-Net GG2 cancer segmentation model used 1500 bpMRI scans for training, 1000 for testing (PI-CAI cohort), and 573 scans for external validation (PROMIS cohort).The external cohort analysis included MRI assessment by a radiologist (R1) using PI-RADS v2.1 and the original PROMIS study Likert scores as the second assessment (R2).Two readers (QR1, QR2) independently assessed the image quality of the PROMIS MRI scans and determined a consensus PI-QUAL (v2) score.The reference standard was GG2 cancer, confirmed by transperineal saturation biopsy.The model's and radiologists' diagnostic performance (AUCs) were compared.Bootstrap testing was used to calculate 95% Confidence Intervals (CI) and determine significance of the performance differences between quality subgroups.Results: On the external dataset, readers' performance achieved AUCR1 0.90 0.87-0.92and AUCR2 0.80 0.76-0.83,respectively.The DL model's performance on reduced-quality scans (n=141) declined (AUCDL-PIQUAL1 0.63 (0.53-0.71), whereas on high-quality scans (n=432), performance increased (AUCDL-PIQUAL2-3 0.71 0.66-0.75(p<0.05)).In contrast, the readers' performance did not differ significantly across scan qualities (AUCR1-PIQUAL1 0.88 0.82-0.93and AUCR1-PIQUAL2-3 0.91 0.88-0.93(p=0.35);AUCR2-PIQUAL1 0.79 0.71-0.86and AUCR2-PIQUAL2-3 0.80 0.76-0.84(p=0.49).Conclusions: The diagnostic performance of the DL model declined significantly on reduced-quality MRI scans as determined by PI-QUAL scoring, whereas radiologists demonstrated greater robustness to degraded conditions.
Pooch et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: