Fusion-based hyperspectral image super-resolution (HSI-SR) on diffusion models exhibits promising performance in generating high-quality, realistic features. However, existing methods are confronted with two limitations: (1) static conditional guidance is discordant with the dynamic denoising process, and (2) modality conflicts are inadequately addressed by concatenation. To address these challenges, we propose a novel Modulated Diffusion Framework with Spatial–Spectral Disentangled Guidance (SSDG). Specifically, it introduces a Dynamic Modulated Residual Network (DMRN), which leverages a time-aware mechanism to dynamically adjust conditional feature injection, ensuring adaptive guidance throughout all denoising stages. Furthermore, we design a training-free SSDG strategy to explicitly decouple spatial and spectral guidance during sampling, allowing for flexible control over the fusion process to mitigate modality conflicts. Extensive experiments on three public datasets demonstrate that the proposed method achieves state-of-the-art performance, exhibiting superior robustness, particularly in challenging noisy scenarios.
Xu et al. (Fri,) studied this question.