ABSTRACT Multimodal medical image fusion (MMIF) is essential for improving diagnostic accuracy by integrating useful details from various imaging modalities. However, current fusion methods have many challenges, such as modality‐specific drawbacks, restricted generalizability, high processing costs, and limited explainability. This paper introduces a novel framework called Federated and Explainable Multimodal Medical Image Fusion (FEMMIF) designed to address several challenges in medical image fusion through a hybrid approach. FEMMIF employs a modality‐agnostic dual‐branch encoder based on MobileNetV3 to extract both anatomical and functional features. These features are integrated using a cross‐attention technique and then processed through a Feature Importance Learning (FIL) module, which dynamically assigns weights to the contributions of each modality. The combined image is subsequently decoded with a decoder that utilizes residual refinement. Evaluations on various multimodal datasets, including MRI, PET, CT, and SPECT, show that FEMMIF consistently outperforms leading MMIF methodologies, achieving a structural similarity index (SSIM) greater than 0.91, improved entropy metrics, and faster inference speeds. The model demonstrates strong generalization across different modalities, reduced sensitivity to misalignment, and produces interpretable outputs suitable for clinical validation. The federated training approach also maintains privacy while achieving convergence comparable to centralized techniques. Overall, the FEMMIF framework demonstrates strong experimental performance but requires further prospective and multicenter validation before clinical deployment.
Alabduljabbar et al. (Fri,) studied this question.