What does this research mean for the field?

Fine-tuning foundation models for brain tumor classification may compromise their generalization capabilities, while linear probing enables better performance with limited data. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.CHALLENGES_CONSENSUS.

What question did this study set out to answer?

The aim is to evaluate transfer learning strategies for brain tumor classification using various models.

February 20, 2026

Transfer Learning Strategies for Pathological Foundation Models: A Systematic Evaluation in Brain Tumor Classification

Key Points

The aim is to evaluate transfer learning strategies for brain tumor classification using various models.
Compared fine-tuning and linear probing for brain tumor classification.
Used foundation models (UNI, Prov-GigaPath) and conventional models (ImageNet-pretrained ViT-L, CTransPath).
Trained models on an institutional dataset of 254 brain tumor cases.
Validated performance on a separate EBRAINS dataset of 698 cases.
Conventional models generally performed better with fine-tuning than linear probing.
Foundation models showed better performance with linear probing on external data.
UNI model using linear probing with 10 patches outperformed fine-tuned conventional models with 500 patches on external validation.

Abstract

Abstract Deploying pathology AI at individual hospitals faces challenges including limited cases and adapting pretrained models to local data. Brain tumor classification, with diverse diagnostic categories but few cases per institution, represents this challenge. Foundation models may offer a solution, but optimal transfer learning strategies remain unclear. We evaluated fine‐tuning (FT) versus linear probing (LP) for brain tumor classification using foundation models (UNI, Prov‐GigaPath) and conventional models including ImageNet‐pretrained Vision Transformer (ViT‐L) and CTransPath. Models were trained on an institutional dataset (254 cases: glioblastoma, astrocytoma, oligodendroglioma, PCNSL, and metastatic tumors) and validated on EBRAINS dataset (698 cases). Conventional models maintained FT ≥ LP on both datasets. However, foundation models showed a reversal: FT only marginally outperformed LP on institutional data, and LP significantly outperformed FT on external data ( p < 0.01), suggesting that fine‐tuning may compromise the generalization capabilities of foundation models. Notably, UNI with LP using only 10 patches per case significantly outperformed fine‐tuned conventional models using 500 patches on external validation ( p < 0.001). These findings suggest that for foundation models, fine‐tuning on limited institutional data may cause overfitting, and preserving pre‐trained representations through linear probing enables more efficient AI implementation with better generalization.

Mark Helpful

Bookmark

Relay