Abstract Medical imaging models increasingly achieve high performance under controlled experimental settings, yet their reliability often degrades when applied across institutions, scanners, imaging protocols, and patient populations. This review examines Convolutional Neural Network (CNN) and Vision Transformer (ViT) architectures, together with domain adaptation methods, to address the generalization challenges posed by domain shift in medical imaging. By integrating 160 high-quality studies into a unified analytical framework, the review identifies scenarios in which ViT-based methods offer potential advantages, summarizes effective adaptation techniques, and discusses key obstacles to clinical implementation. The findings show that although ViTs are well suited to capturing global context and long-range dependencies, their reported performance improvements do not imply universal dominance. Their effectiveness depends on dataset characteristics, shift severity, pretraining quality, adaptation strategy, and task type. Adversarial learning, feature alignment, and self-supervised pretraining are identified as effective strategies for reducing cross-domain performance degradation, while CNN–ViT hybrid models offer a practical trade-off between accuracy, robustness, and computational feasibility. Nonetheless, high computational demands, substantial data requirements, interpretability limitations, and the lack of standardized multicenter evaluations continue to hinder widespread clinical use. To address these gaps, the review emphasizes a domain-aware evaluation perspective that distinguishes average task performance from robustness-oriented measures, including cross-domain performance drop, domain-wise variability, worst-domain performance, and calibration under distribution shift. Future research should focus on efficient ViT designs, small-sample adaptation, privacy-preserving frameworks, and robust evaluation protocols that account for cross-domain variability, calibrated uncertainty, workflow integration, regulatory readiness, human oversight, drift monitoring, and controlled model updating in real-world medical imaging environments.
Kaan ARIK (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: