Background/Objectives: The precise classification of non-small-cell lung cancer (NSCLC) into lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) has important role in treatment decisions and in prognosis. Proper subtyping ensures that patients receive the most appropriate therapeutic strategies and allows clinicians to make informed evaluations regarding disease outcomes. This study presents a deep neural-network-based classification approach utilizing genome-wide DNA methylation profiles from the Illumina HumanMethylation450 BeadChip platform. Methods: A total of 5000 of the most discriminative CpG probes are identified through variance-based feature selection in the presented methodology, which are then classified through a five-layer deep neural network with batch normalization and dropout regularization. Training and validation were performed using data from The Cancer Genome Atlas (TCGA), with external validation conducted on two independent Gene Expression Omnibus (GEO) datasets: GSE39279 and GSE56044. Results: The model achieved 96.92% accuracy with an area under the receiver-operating characteristic curve (AUC-ROC) of 0.9981 on the TCGA test set. Robust generalization was obtained in cross-dataset validation experiments, with the GEO-trained model achieving 88.92% accuracy and 0.9724 AUC-ROC when validated on TCGA data. The most influential CpG biomarkers contributing to classification decisions are analysed using SHAP (Shapley Additive Explanations). Conclusions: These findings demonstrate the potential of DNA methylation-based deep learning approaches for reliable NSCLC subtype classification with clinical applicability.
Almufareh et al. (Thu,) studied this question.