Infrared and visible image fusion (IVIF) aims to generate high-fidelity images by integrating complementary cross-modal information. However, existing methods often suffer from limited robustness when handling pronounced modality discrepancies and complex structures, and frequency-domain modeling typically incurs substantial computational redundancy. To address these challenges, we propose a residual prior-guided wavelet–fourier iterative fusion network (RWFI-Fusion). The proposed approach achieves frequency decoupling via wavelet transformation and adopts a divide-and-conquer strategy to model low-frequency and high-frequency components separately. As fourier transforms perform global modeling at full image resolution and may lead to unnecessary computational overhead, we exploit the complementary strengths of wavelet and Fourier transforms to enhance intrinsic frequency-domain representations of different modalities. In addition, a residual prior is introduced to explicitly extract complementary information from modality discrepancies, thereby improving information propagation during fusion. The fusion process is further refined within an iterative optimization framework that dynamically regulates cross-modal information flow, enabling progressive enhancement of the fused results. Extensive experiments on four datasets and downstream object detection tasks demonstrate that RWFI-Fusion consistently outperforms existing methods in both quantitative metrics and visual quality, while maintaining high computational efficiency.
Wang et al. (Wed,) studied this question.