What question did this study set out to answer?

The study aims to create a predictive model for adverse outcomes in breast cancer patients using clinical data and machine learning.

March 3, 2026Open Access

Predicting adverse prognostic outcomes in hospitalized breast cancer patients: development and validation of a risk model

Key Points

The study aims to create a predictive model for adverse outcomes in breast cancer patients using clinical data and machine learning.
Analyzed comprehensive clinical data from 4,242 hospitalized breast cancer patients.
Utilized univariate/multivariate analysis for feature selection and evaluated ten machine learning models.
Addressed class imbalance using various undersampling ratios (1:1, 1:2, 1:3) to optimize model performance.
Measured performance using accuracy, sensitivity, specificity, AUC, and precision.
The Random Forest model achieved 88.69% accuracy, 0.733 sensitivity, and 0.952 specificity, with an AUC range of 0.881–0.898.
Metastasis was identified as the strongest risk factor with a SHAP value of +6.0.
Logistic Regression also demonstrated high performance with AUC > 0.88 and very high specificity.

Abstract

Breast cancer prognosis is influenced by diverse clinical factors, necessitating advanced predictive tools for personalized treatment. This study developed machine learning models to predict adverse outcomes in breast cancer patients using comprehensive clinical data. We analyzed data from 4,242 patients (3,836 with stable progression and 406 with adverse outcomes) at a single institution. Feature selection via univariate/multivariate analysis and machine learning importance ranking identified 10 key predictors, including metastasis stage, hospitalization duration, total healthcare costs, and laboratory test results. Ten machine learning models were evaluated using 1:1, 1:2, and 1:3 undersampling ratios to address class imbalance. Performance was assessed via accuracy, sensitivity, specificity, the area under the curve (AUC), and precision. The Random Forest (RF) model achieved an optimal balance with 88.69% accuracy, 0.733 sensitivity, and 0.952 specificity (AUC range: 0.881–0.898) across sampling ratios, demonstrating superior class separation.SHapley Additive exPlanations (SHAP) analysis revealed metastasis (SHAP = + 6.0) as the strongest risk factor, while marital status and Luminal A subtype were protective. Logistic Regression(LR) also performed robustly (AUC > 0.88), excelling in specificity (0.950 under 1:3 undersampling). Machine learning models, particularly RF and LR, effectively predicted prognosis by integrating clinical and laboratory features. RF’s stability in imbalanced data and interpretability via SHAP enhanced clinical utility. Future multi-center studies should validate these findings and explore causal mechanisms. Not applicable.

Bookmark

View Full Paper

Bookmark

View Full Paper

Predicting adverse prognostic outcomes in hospitalized breast cancer patients: development and validation of a risk model

Key Points

Abstract

Cite This Study