Optical and synthetic aperture radar (SAR) imagery are highly complementary in terms of texture details and structural scattering characterization. However, their imaging mechanisms and statistical distributions differ substantially. In particular, pseudo-high-frequency components introduced by SAR coherent speckle can be easily entangled with genuine optical edges, leading to texture mismatch, structural drift, and noise diffusion. To address these issues, we propose WEMFusion, a wavelet-prior-driven framework for frequency-domain decoupling and discrepancy-aware state-space fusion. Specifically, a multi-scale discrete wavelet transform (DWT) explicitly decomposes the inputs into low-frequency structural components and directional high-frequency sub-bands, providing an interpretable frequency-domain constraint for cross-modality alignment. We design a hybrid-modality enhancement (HME) module: in the high-frequency branch, it effectively injects optical edges and directional textures while suppressing the propagation of pseudo-high-frequency artifacts, and in the low-frequency branch, it reinforces global structural consistency and prevents speckle perturbations from leaking into the structural component, thereby mitigating structural drift. Furthermore, we introduce a discrepancy-aware gated Mamba fusion (DAG-MF) block, which generates dynamic gates from modality differences and complementary responses to modulate the parameters of a directionally scanned two-dimensional state-space model, so that long-range dependency modeling focuses on discrepant regions while preserving directional coherence. Extensive quantitative evaluations and qualitative comparisons demonstrate that WEMFusion consistently improves structural fidelity and edge detail preservation across multiple optical–SAR datasets, achieving superior fusion quality with lower computational overhead.
Wang et al. (Sun,) studied this question.