What does this research mean for the field?

Multimodal machine learning models for non-small cell lung cancer provide consistent but modest improvements in overall survival prediction compared to single-modality approaches, though clinical adoption is currently constrained by high risk of bias and limited independent validation. Novelty: ClaimNovelty.SYNTHESIS. Consensus alignment: ConsensusAlignment.NEUTRAL.

May 30, 2026

The use of multimodal machine learning models for predicting overall survival in patients with non-small cell lung cancer: A systematic review.

Key Points

To systematically review multimodal machine learning models for predicting overall survival in non-small cell lung cancer.
Conducted a systematic review following PRISMA 2020 guidelines.
Evaluated studies that developed or validated ML models integrating ≥2 modalities.
Assessed risk of bias using PROBAST and synthesized results narratively.
Included 18 studies from 2021 to 2025 with sample sizes from 115 to 2,898.
Internal C-indices for time-to-event OS ranged from 0.658 to 0.893, with the highest at 0.893 from CT+clinical models.
Multimodal models generally outperformed single-modality models by approximately +0.06 C-index or AUROC.

Abstract

e20000 Background: Multimodal machine-learning (ML) models that fuse imaging, pathology, omics, and clinical data may improve overall-survival (OS) prediction in non–small-cell lung cancer (NSCLC) beyond staging-based tools. We systematically reviewed the design, performance, and methodological quality of these models. Methods: Following PRISMA 2020, we searched Ovid (MEDLINE/Embase/CENTRAL/CDSR), Scopus, IEEE Xplore, and arXiv (January 2017–July 2025). Eligible studies developed or validated ML models integrating ≥2 modalities to predict OS in adults with NSCLC and reported either time-to-event or fixed-horizon binary outcomes. Two reviewers independently screened, extracted, and assessed risk of bias using PROBAST with PROBAST-AI items. Due to heterogeneity, results were synthesized narratively. Results: We included 18 studies (2021–2025) with per-study sample sizes ranging from 115 to 2,898. Outcome framing: time-to-event OS only (n=11), fixed-horizon binary OS only (n=4), and both (n=3). Modalities most often used were clinical structured data (15/18), CT (12/18), PET (6/18), molecular omics (7/18), pathology whole-slide images (4/18), and EHR text (1/18). Fusion strategies clustered as early/concatenation (10/18), interaction-based (attention/bilinear/graph; 5/18), and late/score-level (3/18). For time-to-event OS, internal C-indices ranged 0.658–0.893, with the highest internal value 0.893 (CT+clinical). One study reported external C-index (0.678, pathology+genes). An additional study reported external time-dependent AUC 0.845 at 1-year for a PET/CT-genomics survival model (n=32). For binary OS, internal AUROCs were 0.802–0.888 (2–5-year horizons), and internal accuracies ranged 0.68–0.93 (1–5 years). External binary performance included accuracy 0.72 at 1-year in an immunotherapy cohort. Across studies, multimodal models typically outperformed the best single-modality comparator by ~+0.06 C-index or AUROC, though absolute gains varied. Risk of bias was frequently high in the analysis domain (internal-only validation, optimistic tuning, sparse calibration reporting); code/weights were publicly available in 5/18 studies. Conclusions: Multimodal ML models for NSCLC show consistent, modest improvements in OS discrimination versus single-modality approaches, with CT+clinical the most translationally pairing. However, independent validation, calibration, and transparency remain limited, constraining clinical adoption. Future work should prioritize multi-center datasets, standardized reporting, and open workflows.

Ask AI

Helpful

Bookmark