BACKGROUND: Gastric cancer (GC) is a leading cause of cancer-related deaths globally, with early detection crucial for improving survival. Current non-invasive biomarkers lack sensitivity in early stages, necessitating more accurate diagnostic tools. METHODS: Untargeted metabolomics was performed on serum samples from 151 GC patients and 103 healthy controls using LC-MS. A machine learning (ML) pipeline involving LASSO regression, Random Forest, and Decision Tree was applied for feature selection and model building. Ten ML classifiers were evaluated, with final selection based on cross-validation and test-set performance. Model interpretability was assessed via SHAP analysis, and clinical utility via decision curve analysis (DCA). RESULTS: From 2136 detected metabolites, four core metabolites, including ribothymidine (rT), phytocassane B (PCB), enalapril (ENP), and sinapaldehyde (SA), were selected as a diagnostic panel. The random forest (RF) model achieved an AUC of 0.97 on the test set, significantly outperforming conventional biomarkers. Pathway analysis revealed dysregulation in lipid and amino acid metabolism. The model showed strong calibration and provided higher net clinical benefit than treat-all or treat-none strategies across a wide threshold range. CONCLUSIONS: This study developed and internally evaluated a high-performance, metabolomics-based ML model for early GC detection using a four-metabolite serum signature. The approach provides a non-invasive and interpretable diagnostic strategy with promising clinical potential.
Ye et al. (Tue,) studied this question.