BACKGROUND Non-small cell lung cancer (NSCLC) is one of the most common cancers and a leading cause of cancer-related mortality, making prognostic prediction clinically essential. Machine learning models are increasingly being utilized to assess prognosis; however, developing systems that combine high discrimination with clear, clinically interpretable reasoning remains challenging. OBJECTIVE To develop deep learning models that predict 5-We identified patients diagnosed between 2014 and 2017 who had complete clinical data, pulmonary function test results, histological information, genomic data, and staging details. After preprocessing, the cohort was divided into stratified training, validation, and test sets in a 70%:15%:15% ratio. Five models were tuned using Hyperband across ten predefined feature groups. The primary metric for evaluation was the area under the receiver operating characteristic curve (AUC); additional metrics reported included accuracy, F1 score, precision, and recall. Group-wise permutation importance was calculated for each model, and the concordance of importance rankings was assessed using the Friedman test. A Cox proportional hazards (CPH) model was utilized as a baseline comparator.year mortality in NSCLC using data from the Korea Central Cancer Registry (KCCR) and to quantify feature importance through permutation testing. METHODS We identified patients diagnosed between 2014 and 2017 who had complete clinical data, pulmonary function test results, histological information, genomic data, and staging details. After preprocessing, the cohort was divided into stratified training, validation, and test sets in a 70%:15%:15% ratio. Five models were tuned using Hyperband across ten predefined feature groups. The primary metric for evaluation was the area under the receiver operating characteristic curve (AUC); additional metrics reported included accuracy, F1 score, precision, and recall. Group-wise permutation importance was calculated for each model, and the concordance of importance rankings was assessed using the Friedman test. A Cox proportional hazards (CPH) model was utilized as a baseline comparator. RESULTS All five models yielded comparable discrimination on the test set (AUC 0.875–0.879; accuracy 0.796–0.822; F1 0.815–0.846). Permuting the 'Stage' group resulted in the most significant decrease in AUC, followed by 'Pulmonary Function Test', 'Symptoms', and 'Age'. The 'Gene Mutation' group had a modest overall impact but became more influential within the adenocarcinoma subset. The Friedman test showed no statistically significant differences in importance rankings across the models (p = .928). CONCLUSIONS A meticulously tuned, grouped-input deep learning framework offered reliable and interpretable predictions for 5-year mortality in NSCLC. Group-level permutation importance provided stable and reproducible insights into the clinical factors influencing risk, which may guide future model refinement and clinical decision-making.
Lee et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: