An XGBoost model integrating Amerindian ancestry, clinical data, and transcriptomic features predicted neoadjuvant chemotherapy response in Colombian breast cancer patients with an AUC of 0.90.
Does a machine learning model integrating ancestry and transcriptomic features predict neoadjuvant chemotherapy response in Colombian breast cancer patients?
58 Colombian women with locally advanced breast cancer treated with neoadjuvant chemotherapy (29 responders, 29 non-responders) across five molecular subtypes.
Machine learning models (Random Forest, XGBoost) integrating clinical variables, genetic ancestry fractions, and 34 differentially expressed genes to predict treatment response.
Prediction of neoadjuvant chemotherapy response (measured by AUC).surrogate
Machine learning integration of ancestry and transcriptomic features accurately predicts neoadjuvant chemotherapy response in Colombian breast cancer patients, highlighting the importance of population-specific factors.
Abstract Background: Breast cancer resistance to neoadjuvant chemotherapy (NAC) remains a major challenge in Latin America, where limited genomic representation restricts precision-oncology advances. Understanding how genetic ancestry interacts with transcriptomic features may uncover population-specific predictors of treatment response. We investigated the ancestry-transcriptome link using machine-learning models in Colombian breast cancer patients. Methods: We analyzed 58 women with locally advanced breast cancer treated with NAC (29 responders, 29 non-responders) across five molecular subtypes. RNA-seq identified 339 differentially expressed genes (DEGs); the top 10% most variable DEGs (n=34) were retained following variance-stabilizing normalization. Predictors included clinical variables (tumor size, TNM stage, T-stage, N-stage, grade, clinical stage, treatment regimen, age, BMI, menopause), genetic ancestry fractions (Amerindian-AMR, African-AFR, European-EUR), and 34 DEGs. Recursive Feature Elimination did not improve model performance; therefore, all variables were included. Random Forest (500 trees) and XGBoost models were trained, with hyperparameter optimization via cross-validation. Results: XGBoost achieved the highest performance (AUC = 0.90) using a learning rate of 0.05, depth 12, and 90% subsample/colsample. Across models, T-stage, age, and Amerindian ancestry consistently emerged as top predictors based on gain, coverage, and split frequency. Among transcriptomic variables, CACNA1D, CLEC3A, TFF1, and TTK showed strongest predictive contribution. Model robustness was confirmed through parameter variation and resampling strategies. Conclusions: Machine-learning integration of ancestry and transcriptomic features accurately predicts NAC response in Colombian breast cancer patients. Amerindian ancestry, alongside key clinical variables and reproducible gene-expression signatures, influenced prediction performance, underscoring the importance of population-specific factors in treatment resistance. This ancestry-transcriptome framework provides a scalable, data-driven approach for advancing precision oncology in underrepresented Latin American populations. Citation Format: Michelle Guevara-Nieto, María J. López-Munevar, Carlos Orozco-Castaño, Rafael Parra-Medina, Laura Fejerman, Valentina Zavala, Jone Garai, Jovanny Zabaleta, Alba L. Combita-Rojas, Liliana López-Kleine. The ancestry-transcriptome link: Machine learning predicts chemotherapy response in breast cancer abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 4206.
Building similarity graph...
Analyzing shared references across papers
Loading...
Michelle Guevara-Nieto
María J. López-Munevar
Carlos A. Orozco
Cancer Research
University of California, Davis
University of Chile
University of New Orleans
Building similarity graph...
Analyzing shared references across papers
Loading...
Guevara-Nieto et al. (Fri,) reported a other. An XGBoost model integrating Amerindian ancestry, clinical data, and transcriptomic features predicted neoadjuvant chemotherapy response in Colombian breast cancer patients with an AUC of 0.90.
www.synapsesocial.com/papers/69d1fd9ca79560c99a0a3c10 — DOI: https://doi.org/10.1158/1538-7445.am2026-4206