As a critical statistical technique in oncology, survival prediction is used to estimate the probability of survival or time-to-event outcomes. Identifying survival-related factors from pathology and genomic data is a key approach for analyzing survival outcomes. However, current methods face several challenges, such as the suboptimal adaptation of pre-trained vision foundation models to specific tasks during feature extraction from whole slide images (WSIs), and the fact that many pathology-based models fail to integrate repetitive gene expression information during pre-training. In this study, we propose a plug-and-play multiple instance learning (MIL)-based foundation model tuning strategy to adapt vision foundation models for downstream tasks and incorporate knowledge from genomic data. Specifically, we introduce Task-specific Instance Selection, which utilizes zero-shot learning to efficiently select task-relevant WSI regions, improving tuning efficiency and reducing interference from irrelevant tissue areas. Additionally, we develop a multi-model prompt token for model fine-tuning, which integrates genetic information into the prompt-tuning process and transfers new modality information to pre-trained vision foundation models. To further enhance the model's ability to learn genetic information during fine-tuning, we introduce a Gene Distribution Aware Task as an auxiliary task to the traditional survival task. This auxiliary task helps the model better perceive multimodal information. Extensive experimental results on three public TCGA datasets demonstrate that our model outperforms all previous MIL-based methodologies and fine-tuning approaches in terms of performance.
Zhang et al. (Wed,) studied this question.