A Random Forest model integrating clinical and NLP-derived features predicted immunotherapy response in hepatocellular carcinoma with an AUC of 0.77, precision of 0.72, and recall of 0.72.
Can machine learning models integrating clinical and NLP-derived features accurately predict immunotherapy response in patients with hepatocellular carcinoma?
302 patients with hepatocellular carcinoma (HCC) treated with immunotherapy at UPMC Hillman Cancer Centre between December 2014 and December 2023.
Machine learning models (including Random Forest) integrating 50 combined clinical and NLP-derived text-embedded features from radiological and clinical notes.
Machine learning models restricted to 20 clinical variables alone.
Prediction of response to immunotherapy (stable disease vs. progression).
Machine learning models integrating both clinical and NLP-derived features from clinical notes can accurately predict immunotherapy response in patients with hepatocellular carcinoma.
Abstract Background: Immunotherapy (IO) improves survival in advanced hepatocellular carcinoma (HCC), yet under 30% of patients respond to treatment. Existing biomarkers have shown limited predictive accuracy. Machine learning (ML) and natural language processing (NLP) techniques could be used to develop prediction models that support personalised treatment. We aimed to develop and evaluate machine learning models that predict response to IO in patients with HCC. Methods: We retrospectively analyzed data from 302 patients with HCC treated with immunotherapy at UPMC Hillman Cancer Centre between December 2014 and December 2023. Five machine learning models were developed to predict immunotherapy response including logistic regression, random forest, XGBoost, support vector machine and multi-layer perceptron. Models were initially trained using 20 clinical features, then models were expanded to include 50 combined clinical and text-embedded features. Radiological and clinical notes were processed using a natural language processing (NLP) model to generate text embeddings. Data was split into training (80%) and test (20%) sets. Shapley Additive Explanations (SHAP) was used to interpret the prediction models. Results: Of the 302 patients, 215 (71%) had stable disease and 87 (29%) had progression. The best-performing model was the Random Forest classifier incorporating both clinical and NLP-derived features (AUC 0.77, Precision: 0.72, Recall 0.72). Model performance marginally decreased when restricted to clinical variables alone (AUC 0.71, Precision: 0.70, Recall 0.70). Key predictors of response to immunotherapy included lower alpha-fetoprotein (AFP), liver function tests within normal range (AST, ALT, ALP, albumin, bilirubin), higher total protein and lower grade of ECOG performance status. 142 patients had first-line IO treatment with atezolizumab and bevacizumab (Atezo/Bev) and 57 patients had durvalumab and tremelimumab (Durva/Treme). A subgroup analysis showed that model performance for patients receiving Atezo/Bev (test AUC-ROC 0.97) was superior to those receiving Durva/Treme (test AUC-ROC 0.66). However, there was no statistically significant difference in predicted mean response between the two IO regimens (Atezo/Bev 0.55, Durva/Treme 0.64, T-statistic: -1.54, p-value 0.13). Conclusions: This study demonstrates that ML models integrating both clinical and NLP-derived features can accurately predict IO response in patients with HCC. Key predictors of disease progression included AFP, liver function blood tests and ECOG performance status. Future work will externally validate these results on larger datasets, with the aim of developing generalizable and clinically useful predictive models. Citation Format: Anwaar Saeed, Meghana Singh, Yuming Shi, Alireza Tojjari, Vaishnavi Balaji, Lakshya Sharma, Azhar Saeed, Thant Hoe, Yuxi Zhang, Sola Adeleke. Predicting immunotherapy response in patients with hepatocellular carcinoma from clinical and textual features using AI techniques abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 4219.
Building similarity graph...
Analyzing shared references across papers
Loading...
Anum Saeed
M. Singh
Yuming Shi
Cancer Research
University of St Andrews
UPMC Hillman Cancer Center
University of Vermont Medical Center
Building similarity graph...
Analyzing shared references across papers
Loading...
Saeed et al. (Fri,) reported a other. A Random Forest model integrating clinical and NLP-derived features predicted immunotherapy response in hepatocellular carcinoma with an AUC of 0.77, precision of 0.72, and recall of 0.72.
www.synapsesocial.com/papers/69d1fdbfa79560c99a0a3f93 — DOI: https://doi.org/10.1158/1538-7445.am2026-4219