Diffusion models demonstrate strong potential for improving image quality and preserving fine-grained details in medical image fusion. However, the existing diffusion-based fusion methods lack modality-specific feature estimation during the iterative denoising process, resulting in the gradual entanglement of features. We propose a state-feedback optimization-based diffusion (STFO-Diff) framework, which can extract the generative state and enable accurate feedback learning, thereby achieving fine-grained modeling over modality-specific image generation. Specifically, we design a state measurement module (SMM) comprising two key components: a sparse basis decomposer (SBD) and a modality perception decomposer (MPD). The SBD provides a quantifiable physical representation of the fusion state by decomposing the intermediate fused image into distinct components. In parallel, the MPD provides a modality-aware evaluation by estimating the preservation and completeness of modality-specific information. In STFO-Diff, the estimated states are leveraged to optimize and supervise the reverse diffusion denoising process, where a state feedback mechanism adaptively maintains the modality integrity of modality information in the fused image. Extensive experiments on multiple benchmark datasets validate the effectiveness of the proposed method, demonstrating its superior performance in medical image fusion tasks. The source code is available at https://github.com/zhanglabNKU/STFO-DIff.
Cheng et al. (Thu,) studied this question.