What question did this study set out to answer?

The review aims to analyze CNN and ViT architectures for their effectiveness against domain shift in medical imaging.

June 20, 2026Open Access

From CNNs to Vision Transformers: A Survey on Addressing Domain Shift and Adaptation Concepts in Medical Image Analysis

Puntos clave

The review aims to analyze CNN and ViT architectures for their effectiveness against domain shift in medical imaging.
Review of 160 studies examining CNN and ViT models in medical imaging.
Analysis of domain adaptation methods like adversarial learning and self-supervised pretraining.
Discussion of clinical implementation challenges and evaluation metrics for domain robustness.
ViT methods show potential advantages in capturing global context but are not universally superior.
Effective adaptation strategies identified include adversarial learning and self-supervised pretraining.
Challenges such as high computational demands, interpretability issues, and lack of multicenter evaluations hinder clinical use.

Resumen

Abstract Medical imaging models increasingly achieve high performance under controlled experimental settings, yet their reliability often degrades when applied across institutions, scanners, imaging protocols, and patient populations. This review examines Convolutional Neural Network (CNN) and Vision Transformer (ViT) architectures, together with domain adaptation methods, to address the generalization challenges posed by domain shift in medical imaging. By integrating 160 high-quality studies into a unified analytical framework, the review identifies scenarios in which ViT-based methods offer potential advantages, summarizes effective adaptation techniques, and discusses key obstacles to clinical implementation. The findings show that although ViTs are well suited to capturing global context and long-range dependencies, their reported performance improvements do not imply universal dominance. Their effectiveness depends on dataset characteristics, shift severity, pretraining quality, adaptation strategy, and task type. Adversarial learning, feature alignment, and self-supervised pretraining are identified as effective strategies for reducing cross-domain performance degradation, while CNN–ViT hybrid models offer a practical trade-off between accuracy, robustness, and computational feasibility. Nonetheless, high computational demands, substantial data requirements, interpretability limitations, and the lack of standardized multicenter evaluations continue to hinder widespread clinical use. To address these gaps, the review emphasizes a domain-aware evaluation perspective that distinguishes average task performance from robustness-oriented measures, including cross-domain performance drop, domain-wise variability, worst-domain performance, and calibration under distribution shift. Future research should focus on efficient ViT designs, small-sample adaptation, privacy-preserving frameworks, and robust evaluation protocols that account for cross-domain variability, calibrated uncertainty, workflow integration, regulatory readiness, human oversight, drift monitoring, and controlled model updating in real-world medical imaging environments.

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo