Regularized regression models for predicting advanced breast cancer risk demonstrated similar discrimination (AUC 0.689) and more favorable calibration than machine learning or conventional approaches.
Observational (n=968,178)
Yes
Do regularized regression and machine learning models improve the prediction of advanced breast cancer compared to conventional logistic regression in women undergoing screening mammography?
For predicting advanced breast cancers with rare outcomes and low dimensional features, regularized regression demonstrates similar discrimination and more favorable calibration compared to machine learning approaches.
Absolute Event Rate: 0.689% vs 0.683%
Abstract Background: Machine learning enables complex risk prediction models, but comparative performance with statistical approaches remains context-dependent. We compared statistical and machine learning models for predicting advanced breast cancer risk. Methods: Using data from 968,178 women (40–74 years) undergoing 2,796,459 annual or 812,126 biennial screening mammograms (2005-2019) in the Breast Cancer Surveillance Consortium, we cross-validated models predicting advanced breast cancer within 12 months (annual) or 24 months (biennial) following screening. Models included conventional logistic regression, regularized regressions (LASSO, Elastic net), and machine learning methods (random forests, gradient boosting), considering a modest number of clinical and demographic predictors. Performance was assessed using calibration and area under the receiver operating characteristic curve (AUC). Results: Discrimination was similar across models (AUC 0.677–0.690). Calibration differences were more pronounced. Regularized regressions achieved the most favorable calibration overall and across racial and ethnic groups, with AUC 0.689 (95%CI = 0.676–0.701). Gradient boosting showed comparable AUC but suboptimal calibration (calibration slope 1.12; 95%CI = 1.04–1.20). Conventional logistic regression had slightly lower AUC (0.683; 95%CI = 0.671–0.696) and calibration slope of 0.90 (95%CI = 0.83–0.96). Regression-based approaches were generally well calibrated across racial and ethnic groups (E/O ratio 0.96–1.03; calibration intercept −0.03 to 0.04), with some subgroup deviations in calibration slopes (1). Conclusions: For predicting advanced breast cancers, regularized regression demonstrated similar discrimination and generally more favorable calibration than other approaches. Impact: In settings with rare outcomes and low dimensional features, regularized regression may offer a practical balance between performance and interpretability.
Chen et al. (Thu,) conducted a observational in Advanced breast cancer (n=968,178). Regularized regressions and machine learning models vs. Conventional logistic regression was evaluated on Advanced breast cancer within 12 or 24 months following screening (AUC) (95% CI 0.676-0.701). Regularized regression models for predicting advanced breast cancer risk demonstrated similar discrimination (AUC 0.689) and more favorable calibration than machine learning or conventional approaches.