Breast cancer prognosis is influenced by diverse clinical factors, necessitating advanced predictive tools for personalized treatment. This study developed machine learning models to predict adverse outcomes in breast cancer patients using comprehensive clinical data. We analyzed data from 4,242 patients (3,836 with stable progression and 406 with adverse outcomes) at a single institution. Feature selection via univariate/multivariate analysis and machine learning importance ranking identified 10 key predictors, including metastasis stage, hospitalization duration, total healthcare costs, and laboratory test results. Ten machine learning models were evaluated using 1:1, 1:2, and 1:3 undersampling ratios to address class imbalance. Performance was assessed via accuracy, sensitivity, specificity, the area under the curve (AUC), and precision. The Random Forest (RF) model achieved an optimal balance with 88.69% accuracy, 0.733 sensitivity, and 0.952 specificity (AUC range: 0.881–0.898) across sampling ratios, demonstrating superior class separation.SHapley Additive exPlanations (SHAP) analysis revealed metastasis (SHAP = + 6.0) as the strongest risk factor, while marital status and Luminal A subtype were protective. Logistic Regression(LR) also performed robustly (AUC > 0.88), excelling in specificity (0.950 under 1:3 undersampling). Machine learning models, particularly RF and LR, effectively predicted prognosis by integrating clinical and laboratory features. RF’s stability in imbalanced data and interpretability via SHAP enhanced clinical utility. Future multi-center studies should validate these findings and explore causal mechanisms. Not applicable.
Ma et al. (Sat,) studied this question.