What does this research mean for the field?

The WaveFusion model, a wavelet vision transformer with a saliency-guided loss strategy, achieves a superior balance of performance and cost-efficiency in multi-modal image fusion and improves outcomes in downstream tasks such as semantic segmentation and object detection. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

March 11, 2025

WaveFusion: A Novel Wavelet Vision Transformer With Saliency-Guided Enhancement for Multimodal Image Fusion

Key Points

Key points are not available for this paper at this time.

Abstract

Multi-modal image fusion aims to amalgamate pivotal information from various sensor sources to provide informative visual representation in imaging scenes. Rapid and precise fusion of images is crucial for practical applications in fields such as autonomous driving and medical diagnostics. However, the primary challenge lies in balancing computational costs with the effectiveness of feature extraction, while ensuring the robust integration of salient features across modalities. Here, this paper introduces WaveFusion, a wavelet vision transformer equipped with an advanced saliency-guided loss strategy to optimize multi-modal image fusion. Initially, to provide a comprehensive and efficient representation of multi-modal data, we introduce an adaptive wavelet transform module for feature decomposition and reconstruction. Following this, self-attention mechanisms and convolutional networks are naturally applied in parallel to process low-frequency and high-frequency components, resulting in the development of a wavelet-enhanced vision transformer. Secondly, WaveFusion utilizes a dual-aggregation attention approach that improves cross-modal feature complementarity and intra-modal feature coherence within a single fusion module. Furthermore, we propose a dynamic saliency-informed selective loss function to refine the optimization process, with the objective of enhancing critical feature retention and maintaining overall image consistency across fusion scenarios. The efficacy and versatility of our method are validated in both infrared-visible fusion and medical image fusion tasks. Experiment results demonstrate that WaveFusion provides a superior balanced approach that optimizes both fusion performance and cost-efficiency, and additionally improves performance in downstream tasks such as multi-modal semantic segmentation and object detection.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Qinghua Wang

Jiangsu University

Ziwei Li

University of Science and Technology of China

Shuqi Zhang

Beijing University of Chinese Medicine

Journals

IEEE Transactions on Circuits and Systems for Video Technology

Actions

Institutions

Tsinghua University

Fudan University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

WaveFusion: A Novel Wavelet Vision Transformer With Saliency-Guided Enhancement for Multimodal Image Fusion

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study