Key points are not available for this paper at this time.
Gene expression profiles (GEP) are commonly employed to aid decision making for adjuvant chemotherapy administration in ER+/HER2- breast cancer (luminal BC). We aimed to improve upon available tools by deploying a machine learning model which leverages gene expression and clinical data. Using the R package MetaGxBreast, we selected patients with early-stage luminal BC that had not received chemotherapy. Four commercially available GEP (70-gene signature, PAM50 risk of recurrence score, 21-gene recurrence score/RS, and 12-gene score), the 14-gene immunoglobulin (IGG) signature, and a clinical composite risk score (CCRS), summarizing age, nodal status, tumour size, and grade, were calculated for each patient. Multivariable Cox regression evaluated their independent association with distant metastasis-free survival (DMFS). Using the most informative variables, we developed a random survival forest (RSF) model, which is a nonparametric ensemble learner for survival data. An optimal cutoff for model's risk predictions was determined and evaluated on a pooled external validation set of three independent in-house cohorts. The training set comprises 1694 luminal BC patients. Each of the five signatures was independently prognostic after adjustment for clinical variables. Multivariable Cox regression showed independent prognostic associations of the 21-gene RS (HR 2.84, 95% CI 1.79–4.5, p<0.001), the 14-gene IGG signature (HR 10.3, 95% CI 4.35–24.4, p<0.001), and the CCRS (HR 5.76, 95% CI 3.6–9.2, p<0.001). Using these scores as features, an RSF model was built and achieved a C-index of 0.95, and an integrated cumulative/dynamic AUC of 0.83. Applying the optimal cutoff on the external validation set (n=269), we identified a population of 40.5% of the cohort with 15-year DMFS of 90.9%. By comparison, RS low risk (RS<11) isolated 14.4% of the cohort with 91.7% 15-year DMFS. We developed and validated a clinically feasible machine learning model for early-stage luminal BC, integrating gene expression and clinical data, that identified a larger ultra-low risk population than currently achieved by commercial tools. Further studies are needed to validate its analytical accuracy and clinical utility.
Sarafidis et al. (Wed,) studied this question.