A gene pair-based machine learning model for breast cancer predicted 5-year overall survival within 8% of observed outcomes, explaining 72% of survival variance by treatment category.
Does a gene pair-based machine learning model accurately predict 5-year overall survival in metastatic breast cancer patients receiving NGS-directed therapy?
A gene pair-based machine learning model demonstrated strong internal consistency and predicted 5-year overall survival within 8% of observed outcomes in metastatic breast cancer patients.
Absolute Event Rate: 0% vs 0%
Abstract Introduction: Drug resistance in breast cancer (BC) can arise from synergistic genetic alterations. We previously developed a machine learning (ML) model that ranks treatment categories by 5-year predicted overall survival (POS) based on altered gene pairs. Here, we perform the first retrospective validation of this model using an independent BC patient cohort that received next-generation sequencing (NGS)-directed therapy. Methods: Thirty metastatic BC patients who received NGS-directed therapy with FoundationOne Companion Diagnostic (CDx) profiling were analyzed (PMID: 34572791). For each patient, logits scores (representing the probability of death at 5 years) were generated across 8 treatment categories combined with altered gene pair combinations from the CDx gene-set. Inverse logits, representing POS, were computed using a virtual clinical trial where a decoder model received patient data and output probability distributions for each gene pair. Synthetic patient populations were then randomly generated from these probabilities and used to calculate POS rates for each treatment category per patient. Mean inverse logits were analyzed per patient using repeated-measures ANOVA, Friedman tests and pairwise Holm t-tests. The Shapiro-Wilk test and η2 values assessed model normality and strength of association. Results: Across all patients, the POS rate was 70.0%, compared to the actual rate of 62.9%. A one-sample z-test indicated a significant difference (Z=6.82, p0.001), although the model’s predictions were directionally aligned with actual outcomes. Shapiro-Wilk testing indicated that 79.2% of treatment categories across all patients had W0.9, suggesting an approximately normal distribution of logits. Repeated-measures ANOVA confirmed significant treatment-dependent differences in logits for all 30 patients (p0.001, η2 avg=0.72). This indicates that 72% of the variance in POS is explained by the treatment category, after accounting for differences in gene pair alterations across patients. Radiation and tyrosine kinase inhibitors were the treatment categories that consistently ranked highest across patients, while PI3K inhibitors and DNA damage agents consistently ranked lowest. Pairwise Holm t-tests indicated that metabolic agents and receptor tyrosine kinase inhibitors consistently showed no significant difference in POS among patients who received either treatment in real life (p0.05). Conclusion: Our findings demonstrate strong internal consistency with a ML-based approach to predict survival using gene pair and treatment data. Promising external validity is shown by POS within 8% of observed outcomes. Further calibration and validation of this model with subtype-stratified cohorts is warranted to improve validity, enhance clinical utility and maintain predictive stability. Citation Format: Rishi Nair, Nicholas R. Mistry, Roy Khalife, Anthony M. Magliocco. Validation of a gene pair-based machine learning model for treatment prioritization in breast cancer abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 67.
Nair et al. (Fri,) reported a other. A gene pair-based machine learning model for breast cancer predicted 5-year overall survival within 8% of observed outcomes, explaining 72% of survival variance by treatment category.