Abstract Non-small cell lung cancer (NSCLC) is a genetic disease characterized by an abundance of somatic mutations. Within the many mutations that make up the somatic mutation landscape, a select few actively drive the development of cancer. Mutations in driver genes have gained import as therapeutic targets, biomarkers, and aides in understanding oncogenesis. Recent studies have also shown the preponderance of certain drivers in certain demographics, as is the case for the targetable EGFR mutations in female non-smokers of Asian descent. In this study, we attempt to add further clarity and definition to the somatic mutation landscape of NSCLC using 2,551 secondary DNA-sequencing data aggregated from various cohorts. We also utilized machine learning-based methods to impute smoking status and genetic ancestry to create a broad and deep cohort to identify unique associations between somatic features and specific populations. We aggregated and processed the genomic data into highly granular somatic mutation features and identified 87 significantly mutated genes (SMG), 48 of which are putatively novel driver genes. EAS (patients of East Asian ancestry) patients were found to have fewer SMG per patient and a significantly lower tumor mutation burden than patients of other genetic ancestries. This remained true even after stratifying the populations studied by smoking status and cancer subtype. Besides associations with EGFR, multivariable models also identified more frequent ATM and STK11 mutations in EAS patients. Unsupervised clustering on the 87 SMG and 109 mutational signatures identified 6 distinct clusters, including 2 previously identified distinct KRAS-featuring clusters. Common clinical features were observed within some clusters, for example, female EAS non-smokers were particularly enriched in one cluster. Cox regression models identified better survivability in EAS patients and patients with EGFR mutations, whereas patients with mutations in TP53, KEAP1, and BID had significantly poorer survival. The results from this study identify a clear disparity in the somatic mutation landscape of lung cancer patients of different ancestries, which should be taken into consideration in future studies. The results also highlight the benefits of utilizing large, heterogeneous datasets of a specific cancer type in driver gene discovery and the utility of integrating multiple somatic features. This study also emphasizes the benefits of data imputation, integrating 942 patients with otherwise unknown smoking status. Citation Format: Isam Mohd-Ibrahim, Zhuokun Feng, Yu Chen, Lauren Higa, Youping Deng, . Mapping the imputation-augmented somatic mutation landscape of 2,551 NSCLC patients highlights contrasting patterns across histologic subtype, smoking status, and ancestry abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 4178.
Building similarity graph...
Analyzing shared references across papers
Loading...
Isam Mohd-Ibrahim
Zhuokun Feng
Yu Chen
Cancer Research
University of Hawaiʻi at Mānoa
Building similarity graph...
Analyzing shared references across papers
Loading...
Mohd-Ibrahim et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69d1fdbfa79560c99a0a3ff5 — DOI: https://doi.org/10.1158/1538-7445.am2026-4178