Key points are not available for this paper at this time.
Multi-modal image fusion aims to amalgamate pivotal information from various sensor sources to provide informative visual representation in imaging scenes. Rapid and precise fusion of images is crucial for practical applications in fields such as autonomous driving and medical diagnostics. However, the primary challenge lies in balancing computational costs with the effectiveness of feature extraction, while ensuring the robust integration of salient features across modalities. Here, this paper introduces WaveFusion, a wavelet vision transformer equipped with an advanced saliency-guided loss strategy to optimize multi-modal image fusion. Initially, to provide a comprehensive and efficient representation of multi-modal data, we introduce an adaptive wavelet transform module for feature decomposition and reconstruction. Following this, self-attention mechanisms and convolutional networks are naturally applied in parallel to process low-frequency and high-frequency components, resulting in the development of a wavelet-enhanced vision transformer. Secondly, WaveFusion utilizes a dual-aggregation attention approach that improves cross-modal feature complementarity and intra-modal feature coherence within a single fusion module. Furthermore, we propose a dynamic saliency-informed selective loss function to refine the optimization process, with the objective of enhancing critical feature retention and maintaining overall image consistency across fusion scenarios. The efficacy and versatility of our method are validated in both infrared-visible fusion and medical image fusion tasks. Experiment results demonstrate that WaveFusion provides a superior balanced approach that optimizes both fusion performance and cost-efficiency, and additionally improves performance in downstream tasks such as multi-modal semantic segmentation and object detection.
Building similarity graph...
Analyzing shared references across papers
Loading...
Qinghua Wang
Jiangsu University
Ziwei Li
University of Science and Technology of China
Shuqi Zhang
Beijing University of Chinese Medicine
IEEE Transactions on Circuits and Systems for Video Technology
Tsinghua University
Fudan University
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Tue,) studied this question.
synapsesocial.com/papers/6a1d3aa85a0c5c56ea04cc73 — DOI: https://doi.org/10.1109/tcsvt.2025.3549459