Accurate and interpretable analysis of medical images requires the integration of complementary modalities such as Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). This paper proposes SE-MMFusionGAN , a Sustainable and Explainable Multi-Modal Fusion Generative Adversarial Network that unifies interpretability, efficiency, and fusion quality within a single framework. Dual modality-specific encoders extract rich structural and textural representations, which are fused through a Cross-Modality Attention (CMA) mechanism to preserve clinically relevant features. An Activation-Efficiency Regularization module minimizes redundant activations, reducing computational and energy overhead, while a gradient-based attribution mechanism provides modality-aware explainability through interpretable heatmaps. The resulting fused representation enhances both visual interpretability for clinicians and downstream analysis, including anomaly localization. Extensive evaluations on benchmark datasets, including BraTS 2024, CT–MRI paired data, and BMAD, demonstrate that SE-MMFusionGAN achieves improved PSNR, SSIM, Dice, and IoU scores compared to state-of-the-art fusion networks, along with significant reduction in energy consumption. The framework thus establishes a balance between fidelity, interpretability, and sustainability, offering a practical step toward energy-aware and explainable multi-modal medical image fusion.
Dwivedi et al. (Sat,) studied this question.