This study aims to explore a new application paradigm of single nucleotide polymorphisms (SNP) in precision treatment of lung adenocarcinoma by integrating genomics and machine learning techniques. This study is based on a cohort of 83 lung adenocarcinoma patients diagnosed by pathology. Clinical features and SNP genotype data are integrated, and a gradient boosting decision tree (GBDT) algorithm is used to establish an SNPdriver prediction framework. By adaptively learning the nonlinear interaction effects between SNP features, binary classification prediction of driving factors is achieved. This study randomly divided 83 patients with lung adenocarcinoma into 7:3 groups, and there was no significant difference in baseline characteristics (p > 0.05). The SNPdriver model based on GBDT adopts a 6-decision tree ensemble architecture and achieves mutation state weighted prediction through feature path splitting. The validation showed that the predicted Area Under the Curve (AUC) for EGFR and KRAS mutations were 0.90 and 0.85, respectively, and the calibration curve confirmed that the predicted probability was highly consistent with the actual incidence rate. This study successfully constructed the SNPdriver model for predicting driver gene mutations in lung adenocarcinoma based on SNP feature networks. Its high discriminatory power and clinical consistency validated the potential of SNPs as multi-gene coregulatory biomarkers.
Li et al. (Sat,) studied this question.