What does this research mean for the field?

Multimodal AI models that integrate histopathological whole slide images with clinical variables, particularly using cross-attention fusion, improve survival prediction and risk stratification in head and neck cancer compared to using clinical variables alone. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This study aims to improve risk stratification and treatment decisions for head and neck cancer using multimodal AI models.

May 30, 2026

Multimodal AI prediction of head and neck cancer treatment outcomes with whole slide imaging.

Key Points

This study aims to improve risk stratification and treatment decisions for head and neck cancer using multimodal AI models.
Analyzed 645 patients from the public HANCOCK dataset for model development and testing.
Utilized whole slide images and nine clinical variables as inputs to train prediction models.
Employed machine learning-based Cox proportional hazards model (DeepSurv) assessed using Harrell's concordance index (C-index).
Unimodal model C-indices: clinical variables 0.62 +/- 0.04 and image features 0.65 +/- 0.07.
Multimodal models showed improved performance: late fusion 0.63 +/- 0.05, concatenation 0.67 +/- 0.07, and cross-attention 0.69 +/- 0.07, with cross-attention being the most effective.

Abstract

e18000 Background: Current risk stratification and adjuvant treatment decisions for head and neck cancer following surgical resection rely primarily on pathological risk factors. The growing adoption of digital pathology presents an opportunity to leverage histopathological image features to enhance risk stratification accuracy. This study investigated the performance of machine learning models for treatment outcome prediction using image features and traditional clinical variables, and identified optimal strategies for combining these modalities. Methods: We analyzed data from 645 patients in the publicly available HANCOCK dataset for model development and testing. All patients had head and neck cancers treated with primary surgery with or without adjuvant therapy. For each patient, H&E-stained whole slide images of the primary tumor and nine clinical variables served as model inputs. Slide-level image embeddings were extracted using a pretrained vision-language pathology foundation model (TITAN). Clinical variables included tumor site, pT classification, pN classification, histologic grade, perineural invasion, lymphovascular invasion, extranodal extension, margin status, and smoking history. Models were trained to predict survival, with performance measured by Harrell's concordance index (C-index), using an machine learning-based Cox proportional hazards model (DeepSurv). We compared unimodal models (using either clinical variables or image features alone) with multimodal models utilizing different fusion strategies (concatenation, late fusion, and cross-attention). Five-fold cross-validation with early stopping was implemented during training. Results: Unimodal models achieved C-indices of 0.62 +/- 0.04 (clinical variables) and 0.65 +/- 0.07 (image features). Multimodal models demonstrated progressive improvement: late fusion (0.63 +/- 0.05), concatenation (0.67 +/- 0.07), and cross-attention (0.69 +/- 0.07), with cross-attention achieving the highest performance. Conclusions: H&E-stained whole slide images from resected head and neck cancers contain significant prognostic information. Multimodal AI models integrating histopathological images with clinical variables, particularly using cross-attention fusion, enhance prognostic prediction and may improve risk stratification for adjuvant therapy decisions. Harrell's concordance index for each type of model input and feature fusion methods. Model input C-index (mean +/- std) Clinical variables alone 0.62 +/- 0.04 Image features alone 0.65 +/- 0.07 Multimodal (late fusion) 0.63 +/- 0.05 Multimodal (concatenation) 0.67 +/- 0.07 Multimodal (cross attention) 0.69 +/- 0.07

Ask AI

Helpful

Bookmark