What question did this study set out to answer?

The study aims to develop and validate a machine learning-based model to predict the risk of breast cancer in patients with BI-RADS 4 breast lesions.

June 18, 2026

Creation and Verification of a Machine Learning-Based Model for Predicting Breast Cancer Risk in Patients with BI-RADS 4 Breast Lesions

Key Points

The study aims to develop and validate a machine learning-based model to predict the risk of breast cancer in patients with BI-RADS 4 breast lesions.
216 breast lesions from 212 patients were analyzed retrospectively.
Data was split into training (151 cases) and validation (65 cases) sets at a 7:3 ratio.
Logistic regression and LASSO were used to identify independent risk factors and construct eight ML models.
Six independent risk factors were identified through feature selection.
The LR model demonstrated superior stability compared to the random forest model.
The LR model was selected, enhancing the identification of BI-RADS 4 lesions as breast cancer.

Abstract

INTRODUCTION: To develop and validate a machine learning-based model for predicting the risk of breast cancer occurrence in BI-RADS 4 patients. METHODS: 216 breast lesions from 212 patients were included for retrospective analysis. The training set (151 cases) and the validation set (65 cases) were randomly divided from the whole data set at a ratio of 7:3. Use logistic as well as LASSO regressions to identify independent risk factors. Subsequently, eight ML models were constructed. After a comprehensive comparison, the optimal model was selected and visualized using the SHAP algorithm. RESULTS: Through feature selection, six parameters were identified as independent risk factors. The Emax-2 shell cutoff (104.71 kPa) demonstrated the highest diagnostic efficacy for the 4a subgroup. Among the eight ML algorithms, the random forest model exhibited potential overfitting risks(AUC=1.00), whereas the LR model demonstrated superior stability. Consequently, the LR model was selected as the predictive model, and a nomogram was constructed based on it. DISCUSSION: In this study, the LR model enhanced the capacity to identify BI-RADS 4 lesions as BC. However, the study is a single-center study with a relatively small sample size, which brings certain restrictions to the clinical application of the model. CONCLUSION: The LR model is the best choice for predicting BC incidence in BI-RADS 4 lesions, which can help clinicians improve BC identification and treatment at an early stage.

Mark Helpful

Bookmark

Relay