Key points are not available for this paper at this time.
Breast imaging is described and evaluated according to the Breast Imaging Reporting and Data System (BI-RADS). The usage of standardized descriptors has helped standardize breast imaging evaluation. However, the contextual combination of single descriptors into an overall classification of malignancy is subject to expert experience. In this study, we investigated whether a radiomics-based machine-learning model achieves similar diagnostic performance for breast cancer diagnosis on ultrasound images compared to human experts. We used radiomics along with patient information to train, test, and validate machine learning (ML) models for classifying breast masses on B-mode imaging and Strain elastography (SE). The image data was obtained from women who underwent B-mode breast ultrasound (n=1204), SE (n=1174), and subsequent histopathologic evaluation at 12 study sites across 7 countries from 2016 to 2019. The ML models were trained and tested on data from 11 of the 12 sites and externally validated using the remaining site's data. We compared the diagnostic performance of the models to three human experts. In the external validation sets (n = 343 for B-mode, n = 321 for SE), the B-mode radiomics model (AUC 0.86, 95% CI 0.82 to 0.90) and SE radiomics model (AUC 0.83, 95% CI 0.78 to 0.87) showed good diagnostic performance. Both radiomics models performed on pair with human B-mode experts (AUC 0.83, 95% CI 0.80 to 0.86) and significantly better compared to human SE experts (AUC 0.73, 95% CI 0.67 to 0.78). Both models were well-calibrated.Table: 169PPerformance comparison radiomics models and human expertsModalityB-mode ultrasoundStrain elastographyB-mode experts (n=1206)Radiomics model B-mode imaging (N=343)Strain elastography experts (N=1190)Radiomics model Strain elastography (N=333)AUROC – value (95% CI)0.83 (CI 0.80 to 0.86)0.86 (CI 0.82 to 0.90)0.73 (CI 0.67 to 0.78)0.83 (CI 0.78 to 0.87)Sensitivity – % (95% CI); no.93.7% (0.91 to 0.96) 328 of 35098.3% (0.94 to 1.00) 114 of 11674.1% (0.69 to 0.79) 255 of 34499% (0.95 to 1.00) 114 of 115Specificity – % (95% CI); no.25.9% (0.23 to 0.29) 222 of 85633.5% (0.27 to 0.40) 76 of 22661.6% (0.58 to 0.65) 521 of 84632.0% (0.26 to 0.39) 70 of 219Negative predictive value – % (95% CI); no.91.0% (0.89 to 0.99)222 of 24496.2% (0.91 to 1.00)78 of 8185.4% (0.82 to 0.88)521 of 61098.6% (0.92 to 1.00)70 of 71Calibration scoreSpiegelhalter z0.21-1.45P-value0.420.07 Open table in a new tab We present the largest, international, radiomics-based machine-learning models for breast cancer diagnosis on ultrasound images, performing on par with human experts. Pending prospective validation, our findings have the potential to standardize image analysis by providing an objective and consistent approach to breast ultrasound interpretation.
He et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: