Accurate metastasis prediction in lung adenocarcinoma is essential for effective treatment plans and improved prognosis. Current methods face challenges in accuracy and clinical application. We developed a binary classification model (XGBoost) and a multiclassification model (stacking) using SEER database data. The binary model predicts metastasis presence (M0 vs. non-M0), and the multiclassification model further refines the degree of metastasis (M1a, M1b, M1c). Model performance was assessed using ROC AUC, PR AUC, and KS curves. SHAP values were used to analyze important features and explain the decision-making process. The binary model achieved ROC AUC and PR AUC scores exceeding 0.77, with the KS curve showing high consistency in distinguishing between positive and negative samples. The multiclassification model also performed well, demonstrating stability and generalizability across different metastasis stages. Key predictive factors included AJCC stage, survival duration, tumor size, and treatment information. This study improves the accuracy and clinical application of metastasis prediction in lung adenocarcinoma through interpretable machine learning models. The combination of binary and multiclassification models not only predicts metastasis presence but also details its extent, providing valuable clinical decision support. Future research should integrate diverse data sources to enhance model robustness and better serve clinical practice. Not applicable.
Xu et al. (Tue,) studied this question.