What question did this study set out to answer?

This research investigates whether foundation models outperform domain-specific models in different cancer types.

June 17, 2026Open Access

Do Foundation Models Truly Outperform Domain-Specific Models? Evidence from Digital Pathology

Key Points

This research investigates whether foundation models outperform domain-specific models in different cancer types.
Benchmarking seven foundation models and three domain-specific models across eleven datasets from pediatric hematology, prostate cancer, and breast cancer.
Using linear probing and last-layer fine-tuning strategies for evaluation.
Examining performance under dataset shifts to test robustness of the models.
In hematology, specialist FM DINOBloom showed AUC of 0.990–0.999 and outperformed generalists like GigaPath (0.981–1.000).
In prostate cancer, generalist FM UNI2-h outperformed specialist HistoEncoder with AUC of 0.956–0.977 vs. 0.908–0.964.
Across breast cancer tasks, UNI2-h had the highest performance, though no specialist FM was available for direct comparison.

Abstract

Foundation models (FMs) are increasingly proposed as general-purpose solutions for computational pathology, with the potential to simplify clinical artificial intelligence deployment by reducing the need for task-specific architectures. However, their reliability across cancer domains with distinct morphological characteristics remains unclear, limiting confidence in real-world clinical use. We benchmarked seven general-purpose pathology FMs and three domain-specific FMs across eleven patch-level datasets spanning three clinically relevant domains: pediatric hematology, prostate cancer, and breast cancer, using both linear probing and last-layer fine-tuning adaptation strategies. By jointly evaluating pediatric leukemia, male-predominant prostate cancer, and female-predominant breast cancer, this study is, to our knowledge, the first to explicitly examine specialist-versus-generalist FM behavior across age- and sex-stratified cancer populations. Performance differences were strongly domain dependent. In hematology, the specialist FM DINOBloom matched and, in several datasets, marginally exceeded leading generalist models (AUC 0.990–0.999 vs. GigaPath 0.981–1.000), suggesting advantages for highly distinctive cellular morphology. In prostate cancer grading, the generalist FM UNI2-h consistently outperformed the specialist HistoEncoder (AUC 0.956–0.977 vs. 0.908–0.964). In breast cancer, UNI2-h achieved the best overall performance across all tasks. No publicly available breast-cancer-specific FM currently exists for direct comparison; therefore, breast cancer results characterize general FM transferability rather than specialist-versus-generalist differences. Importantly, cross-dataset experiments revealed substantial performance degradation under dataset shift in both prostate and breast cancer, indicating that current FMs are not yet robust enough for heterogeneous multi-site clinical use. These findings support the use of generalist FMs as efficient backbones for well-characterized single-site, patch-level tasks, while challenging the assumption that high benchmark performance necessarily reflects true clinical readiness and demonstrating that pathology FMs are not uniformly superior to specialist models.

Read Full Paperexternally

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper