An XGBoost model integrating Amerindian ancestry, clinical data, and transcriptomic features predicted neoadjuvant chemotherapy response in Colombian breast cancer patients with an AUC of 0.90.
Does a machine learning model integrating ancestry and transcriptomic features predict neoadjuvant chemotherapy response in Colombian breast cancer patients?
Machine learning integration of ancestry and transcriptomic features accurately predicts neoadjuvant chemotherapy response in Colombian breast cancer patients, highlighting the importance of population-specific factors.
Tasa de eventos absoluta: 0% vs 0%
Abstract Background: Breast cancer resistance to neoadjuvant chemotherapy (NAC) remains a major challenge in Latin America, where limited genomic representation restricts precision-oncology advances. Understanding how genetic ancestry interacts with transcriptomic features may uncover population-specific predictors of treatment response. We investigated the ancestry-transcriptome link using machine-learning models in Colombian breast cancer patients. Methods: We analyzed 58 women with locally advanced breast cancer treated with NAC (29 responders, 29 non-responders) across five molecular subtypes. RNA-seq identified 339 differentially expressed genes (DEGs); the top 10% most variable DEGs (n=34) were retained following variance-stabilizing normalization. Predictors included clinical variables (tumor size, TNM stage, T-stage, N-stage, grade, clinical stage, treatment regimen, age, BMI, menopause), genetic ancestry fractions (Amerindian-AMR, African-AFR, European-EUR), and 34 DEGs. Recursive Feature Elimination did not improve model performance; therefore, all variables were included. Random Forest (500 trees) and XGBoost models were trained, with hyperparameter optimization via cross-validation. Results: XGBoost achieved the highest performance (AUC = 0.90) using a learning rate of 0.05, depth 12, and 90% subsample/colsample. Across models, T-stage, age, and Amerindian ancestry consistently emerged as top predictors based on gain, coverage, and split frequency. Among transcriptomic variables, CACNA1D, CLEC3A, TFF1, and TTK showed strongest predictive contribution. Model robustness was confirmed through parameter variation and resampling strategies. Conclusions: Machine-learning integration of ancestry and transcriptomic features accurately predicts NAC response in Colombian breast cancer patients. Amerindian ancestry, alongside key clinical variables and reproducible gene-expression signatures, influenced prediction performance, underscoring the importance of population-specific factors in treatment resistance. This ancestry-transcriptome framework provides a scalable, data-driven approach for advancing precision oncology in underrepresented Latin American populations. Citation Format: Michelle Guevara-Nieto, María J. López-Munevar, Carlos Orozco-Castaño, Rafael Parra-Medina, Laura Fejerman, Valentina Zavala, Jone Garai, Jovanny Zabaleta, Alba L. Combita-Rojas, Liliana López-Kleine. The ancestry-transcriptome link: Machine learning predicts chemotherapy response in breast cancer abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 4206.
Guevara-Nieto et al. (Fri,) reported a other. An XGBoost model integrating Amerindian ancestry, clinical data, and transcriptomic features predicted neoadjuvant chemotherapy response in Colombian breast cancer patients with an AUC of 0.90.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: