What question did this study set out to answer?

This study aims to classify non-small-cell lung cancer into adenocarcinoma and squamous cell carcinoma using DNA methylation profiles.

February 14, 2026Open Access

Deep-Learning-Based Classification of Lung Adenocarcinoma and Squamous Cell Carcinoma Using DNA Methylation Profiles: A Multi-Cohort Validation Study

Key Points

This study aims to classify non-small-cell lung cancer into adenocarcinoma and squamous cell carcinoma using DNA methylation profiles.
Used genome-wide DNA methylation data from the Illumina HumanMethylation450 BeadChip platform.
Identified 5000 discriminative CpG probes using variance-based feature selection.
Employed a five-layer deep neural network with batch normalization and dropout regularization for classification.
Conducted training and validation using data from The Cancer Genome Atlas (TCGA) and two GEO datasets.
Analyzed influential CpG biomarkers with SHAP for classification decisions.
Achieved 96.92% accuracy on the TCGA test set with an AUC-ROC of 0.9981.
GEO-trained model reached 88.92% accuracy and 0.9724 AUC-ROC when validated on TCGA data.
Demonstrated robust generalization across different datasets.

Abstract

Background/Objectives: The precise classification of non-small-cell lung cancer (NSCLC) into lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) has important role in treatment decisions and in prognosis. Proper subtyping ensures that patients receive the most appropriate therapeutic strategies and allows clinicians to make informed evaluations regarding disease outcomes. This study presents a deep neural-network-based classification approach utilizing genome-wide DNA methylation profiles from the Illumina HumanMethylation450 BeadChip platform. Methods: A total of 5000 of the most discriminative CpG probes are identified through variance-based feature selection in the presented methodology, which are then classified through a five-layer deep neural network with batch normalization and dropout regularization. Training and validation were performed using data from The Cancer Genome Atlas (TCGA), with external validation conducted on two independent Gene Expression Omnibus (GEO) datasets: GSE39279 and GSE56044. Results: The model achieved 96.92% accuracy with an area under the receiver-operating characteristic curve (AUC-ROC) of 0.9981 on the TCGA test set. Robust generalization was obtained in cross-dataset validation experiments, with the GEO-trained model achieving 88.92% accuracy and 0.9724 AUC-ROC when validated on TCGA data. The most influential CpG biomarkers contributing to classification decisions are analysed using SHAP (Shapley Additive Explanations). Conclusions: These findings demonstrate the potential of DNA methylation-based deep learning approaches for reliable NSCLC subtype classification with clinical applicability.

Bookmark

View Full Paper

Bookmark

View Full Paper

Deep-Learning-Based Classification of Lung Adenocarcinoma and Squamous Cell Carcinoma Using DNA Methylation Profiles: A Multi-Cohort Validation Study

Key Points

Abstract

Cite This Study