What is the clinical evidence from this study?

Study design: Observational. Population: Advanced breast cancer (n=968178). Intervention: Regularized regressions and machine learning models vs. Conventional logistic regression. Primary outcome: Advanced breast cancer within 12 or 24 months following screening (AUC) (95% CI 0.676-0.701).

What question did this study set out to answer?

The study aims to compare the performance of statistical and machine learning models in predicting advanced breast cancer risk.

May 16, 2026

Performance of Statistical and Machine Learning Risk Prediction Models for Advanced Breast Cancers

Key Result

Regularized regression models for predicting advanced breast cancer risk demonstrated similar discrimination (AUC 0.689) and more favorable calibration than machine learning or conventional approaches.

Key Points

The study aims to compare the performance of statistical and machine learning models in predicting advanced breast cancer risk.
Data from 968,178 women undergoing screening mammograms from 2005 to 2019 was used.
Models included conventional logistic regression, LASSO, elastic net, random forests, and gradient boosting.
Performance was assessed via calibration and area under the receiver operating characteristic curve (AUC).
Discrimination among models was similar with AUC ranging from 0.677 to 0.690.
Regularized regressions offered the most favorable calibration with AUC of 0.689 (95% CI = 0.676–0.701).
Gradient boosting showed AUC 0.688 but had suboptimal calibration with a slope of 1.12 (95% CI = 1.04–1.20).

Study Design

Type

Observational (n=968,178)

Multicenter

Yes

Structured PICO

Do regularized regression and machine learning models improve the prediction of advanced breast cancer compared to conventional logistic regression in women undergoing screening mammography?

Population

968,178 women (40–74 years) undergoing 2,796,459 annual or 812,126 biennial screening mammograms (2005-2019) in the Breast Cancer Surveillance Consortium

Intervention

Machine learning models (random forests, gradient boosting) and regularized regressions (LASSO, Elastic net)

Comparator

Conventional logistic regression

Outcome

Advanced breast cancer within 12 months (annual) or 24 months (biennial) following screening, assessed using calibration and area under the receiver operating characteristic curve (AUC)surrogate

For predicting advanced breast cancers with rare outcomes and low dimensional features, regularized regression demonstrates similar discrimination and more favorable calibration compared to machine learning approaches.

Main Result

Absolute Event Rate: 0.689% vs 0.683%

Abstract

Abstract Background: Machine learning enables complex risk prediction models, but comparative performance with statistical approaches remains context-dependent. We compared statistical and machine learning models for predicting advanced breast cancer risk. Methods: Using data from 968,178 women (40–74 years) undergoing 2,796,459 annual or 812,126 biennial screening mammograms (2005-2019) in the Breast Cancer Surveillance Consortium, we cross-validated models predicting advanced breast cancer within 12 months (annual) or 24 months (biennial) following screening. Models included conventional logistic regression, regularized regressions (LASSO, Elastic net), and machine learning methods (random forests, gradient boosting), considering a modest number of clinical and demographic predictors. Performance was assessed using calibration and area under the receiver operating characteristic curve (AUC). Results: Discrimination was similar across models (AUC 0.677–0.690). Calibration differences were more pronounced. Regularized regressions achieved the most favorable calibration overall and across racial and ethnic groups, with AUC 0.689 (95%CI = 0.676–0.701). Gradient boosting showed comparable AUC but suboptimal calibration (calibration slope 1.12; 95%CI = 1.04–1.20). Conventional logistic regression had slightly lower AUC (0.683; 95%CI = 0.671–0.696) and calibration slope of 0.90 (95%CI = 0.83–0.96). Regression-based approaches were generally well calibrated across racial and ethnic groups (E/O ratio 0.96–1.03; calibration intercept −0.03 to 0.04), with some subgroup deviations in calibration slopes (1). Conclusions: For predicting advanced breast cancers, regularized regression demonstrated similar discrimination and generally more favorable calibration than other approaches. Impact: In settings with rare outcomes and low dimensional features, regularized regression may offer a practical balance between performance and interpretability.

Bookmark