e20582 Background: Imaging-based machine learning models have been widely studied in medicine. Aside from diagnostic models, advanced models predicting prognosis or treatment response has been further studied. Predicting response to immune checkpoint inhibitors (ICIs) in patients with non-small cell lung cancer (NSCLC) is one of the important challenges in medicine since response is heterogenous, and multiple studies were done to predict ICI response. However, substantial heterogeneity exists across studies in terms of imaging modalities, feature extraction strategies, or modeling approaches, limiting the interpretability and robustness of the results. Methods: We conducted a systematic review of imaging-based machine learning studies evaluating immunotherapy response prediction. Eligible studies used imaging-derived inputs, including CT, PET-CT, whole-slide images (WSI), to predict treatment response or clinical outcomes. When multiple models with small methodological variation were reported within a single study, only the most representative or best-performing model was included. Given marked heterogeneity in model design and input data, pooled performance estimates and formal meta-analysis were not performed. Results: A total of 97 studies comprising 255 models were included. Most models (81%) relied on radiomics-based feature extraction combined with regression or classical machine learning models, whereas only 23 models from 15 studies used end-to-end deep learning. In subgroup analyses, non-radiomics models (0.77) demonstrated a higher pooled AUC than radiomics-based models (0.73). CT-based models (0.75) were the most frequently studied and showed superior pooled performance compared with other modalities such as PET-CT (0.66) and WSI (0.62). Outcome definitions varied widely, and external validation was inconsistently performed. Conclusions: : This systematic review shows that imaging-based immunotherapy response prediction research is mostly radiomics-driven models combined with classical machine learning, while end-to-end deep learning approaches remain uncommon. This pattern likely reflects practical constraints, as many studies analyzed 3D imaging data and integrated clinical features in relatively small cohorts, which may have limited the usage of end-to-end training or fine-tuning strategies, making radiomics-based classical machine learning approaches a more practical choice. The relatively lower performance observed for PET-CT and WSI models likely reflects smaller sample sizes and limited validation rather than modality limitations. Overall, heterogeneity in diagnostic modalities, modeling strategies, outcome definitions, and external validation is observed and more studies using larger cohorts with external validation will be important in AI-based immunotherapy response prediction studies.
Kim et al. (Thu,) studied this question.