Abstract Objective. To improve survival prediction for HER2-positive breast cancer by integrating histopathological, molecular, and clinical data using a multimodal transformer framework. Approach. We propose a multimodal transformer framework for breast cancer survival prediction using HER2 stratified (SurvMBC), a foundation model-enhanced architecture that fuses three data modalities: whole-slide images (WSIs), clinical narratives, and molecular features. Tumor microenvironment features are extracted using a pathology language and visual pretrained encoder (PLIP), clinical narratives are processed with BioBERT, and miRNA expression plus DNA methylation data are embedded using Gen2Vec. These representations are integrated through a cross-modal transformer with attention mechanisms for survival prediction. Main results. The model was evaluated on 1,095 HER2-positive breast cancer patients from The Cancer Genome Atlas. SurvMBC achieved a concordance index (C-index) of 0.857 (95% CI: 0.834, 0.880), a low integrated Brier score (IBS), and a strong inverse negative binomial log-likelihood (iNBLL). Risk stratification based on model outputs significantly separated high- and low-risk groups (log-rank p < 0.01) and showed strong associations with tumor stage, grade, and hormone receptor status (all p < 0.05). Significance. SurvMBC demonstrates the effectiveness of multimodal fusion in addressing tumor heterogeneity and improving prognostic accuracy. The attention-based integration enables context-aware learning of survival-relevant features across modalities, supporting individualized risk stratification and risk-adaptive treatment planning for HER2 stratified breast cancer patients.
Li et al. (Thu,) studied this question.