Existing Infrared and Visible Image Fusion (IVIF) methods typically assume high-quality inputs. However, when handing degraded images, these methods heavily rely on manually switching between different pre-processing techniques. This decoupling of degradation handling and image fusion leads to significant performance degradation. In this paper, we propose a novel VLM-Guided Degradation-Coupled Fusion network (VGDCFusion), which tightly couples degradation modeling with the fusion process and leverages vision-language models (VLMs) for degradation-aware perception and guided suppression. Specifically, the proposed Specific-Prompt Degradation-Coupled Extractor (SPDCE) enables modality-specific degradation awareness and establishes a joint modeling of degradation suppression and intra-modal feature extraction. In parallel, the Joint-Prompt Degradation-Coupled Fusion (JPDCF) facilitates cross-modal degradation perception and couples residual degradation filtering with complementary cross-modal feature fusion. Extensive experimental results indicate that the proposed VGDCFusion demonstrates marked superiority in degraded image fusion tasks, surpassing existing state-of-the-art methods in both qualitative visual quality and quantitative evaluation metrics (e.g., the AG and SF measures achieve average improvements of approximately 15% and 14.75%, respectively). Our code is available at https://github.com/Lmmh058/VGDCFusion.
Zhao et al. (Wed,) studied this question.