Infrared and visible image fusion aims to integrate complementary information from heterogeneous sensors for remote sensing and Earth-observation applications. To achieve a better balance between global contextual modeling and local structural preservation, we propose MSF-Net, a multi-scale frequency-aware fusion network with a hierarchical design. The proposed framework consists of two main stages: multi-scale feature extraction with frequency-domain interaction and hierarchical cross-modal fusion. Specifically, a hybrid spatial-frequency encoding block (HSFEB) is designed as the basic building unit, which combines a spatial-frequency interaction module (SFIM) for global context aggregation in the frequency domain and a structure-guided feature refinement module (SGFRM) for preserving local structural details. In addition, a hierarchical feature fusion module (HFFM) is introduced to progressively integrate cross-modal and cross-scale features in a coarse-to-fine manner. A joint loss function, composed of intensity and structural constraints, is adopted to supervise the fusion process. Extensive experiments on three public benchmarks, MSRS, M3FD, and TNO, demonstrate that MSF-Net achieves superior performance over nine SOTA methods in both qualitative and quantitative evaluations. The results show that the proposed method effectively enhances thermal targets, preserves structural details, and maintains good visual naturalness under diverse remote-sensing scenarios.
Hu et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: