INTRODUCTION: To develop and validate a machine learning-based model for predicting the risk of breast cancer occurrence in BI-RADS 4 patients. METHODS: 216 breast lesions from 212 patients were included for retrospective analysis. The training set (151 cases) and the validation set (65 cases) were randomly divided from the whole data set at a ratio of 7:3. Use logistic as well as LASSO regressions to identify independent risk factors. Subsequently, eight ML models were constructed. After a comprehensive comparison, the optimal model was selected and visualized using the SHAP algorithm. RESULTS: Through feature selection, six parameters were identified as independent risk factors. The Emax-2 shell cutoff (104.71 kPa) demonstrated the highest diagnostic efficacy for the 4a subgroup. Among the eight ML algorithms, the random forest model exhibited potential overfitting risks(AUC=1.00), whereas the LR model demonstrated superior stability. Consequently, the LR model was selected as the predictive model, and a nomogram was constructed based on it. DISCUSSION: In this study, the LR model enhanced the capacity to identify BI-RADS 4 lesions as BC. However, the study is a single-center study with a relatively small sample size, which brings certain restrictions to the clinical application of the model. CONCLUSION: The LR model is the best choice for predicting BC incidence in BI-RADS 4 lesions, which can help clinicians improve BC identification and treatment at an early stage.
Zhao et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: