Cross-domain few-shot learning (CD-FSL) remains challenging in medical imaging, where labeled data are scarce and source–target domain gaps are often large due to modality differences. In particular, existing few-shot learning methods rely on source–target domain similarity, which limits their effectiveness in cross-modality settings such as MRI-to-CT transfer. To address this problem, this paper proposes an adapter-based Vision Transformer framework for cross-domain few-shot brain tumor classification. Lightweight adapter modules are inserted into a pretrained Vision Transformer to enable parameter-efficient domain adaptation without fine-tuning the entire backbone. In addition, a Prototypical Network is employed to construct class prototypes from limited labeled samples, while a prototype-level Maximum Mean Discrepancy (MMD) loss is introduced to align feature distributions across domains. Unlike prior approaches, the proposed framework introduces a unified prototype-level alignment strategy within an episodic learning paradigm, enabling direct class-wise cross-modal alignment. This design improves generalization under large modality gaps and limited labeled data by jointly optimizing representation learning and domain adaptation. The proposed framework is evaluated on MRI-to-CT brain tumor classification as well as several heterogeneous cross-domain benchmarks, including Chest X-ray, ISIC, CropDisease, and EuroSAT. Experimental results demonstrate that the proposed method achieves competitive performance compared to existing few-shot learning baselines, showing strong robustness under significant domain shifts.
Gull et al. (Mon,) studied this question.