Abstract Deploying pathology AI at individual hospitals faces challenges including limited cases and adapting pretrained models to local data. Brain tumor classification, with diverse diagnostic categories but few cases per institution, represents this challenge. Foundation models may offer a solution, but optimal transfer learning strategies remain unclear. We evaluated fine‐tuning (FT) versus linear probing (LP) for brain tumor classification using foundation models (UNI, Prov‐GigaPath) and conventional models including ImageNet‐pretrained Vision Transformer (ViT‐L) and CTransPath. Models were trained on an institutional dataset (254 cases: glioblastoma, astrocytoma, oligodendroglioma, PCNSL, and metastatic tumors) and validated on EBRAINS dataset (698 cases). Conventional models maintained FT ≥ LP on both datasets. However, foundation models showed a reversal: FT only marginally outperformed LP on institutional data, and LP significantly outperformed FT on external data ( p < 0.01), suggesting that fine‐tuning may compromise the generalization capabilities of foundation models. Notably, UNI with LP using only 10 patches per case significantly outperformed fine‐tuned conventional models using 500 patches on external validation ( p < 0.001). These findings suggest that for foundation models, fine‐tuning on limited institutional data may cause overfitting, and preserving pre‐trained representations through linear probing enables more efficient AI implementation with better generalization.
Enda et al. (Sun,) studied this question.