What question did this study set out to answer?

This research aims to develop a deep fusion framework that improves the alignment and integration of infrared and visible images for autonomous driving.

April 1, 2026Open Access

Cascade Registration and Fusion for Unaligned Infrared and Visible Images in Autonomous Driving

Key Points

This research aims to develop a deep fusion framework that improves the alignment and integration of infrared and visible images for autonomous driving.
Constructed a dual-modal perception dataset with physical parallax and non-rigid deformations.
Implemented the Static-Feature Recursive Registration (SFRR) network for geometric correction of misaligned images.
Utilized the Hierarchical Invertible Block Fusion (HIBF) network for lossless integration of image features.
The proposed method outperformed existing algorithms in metrics such as Mutual Information (MI), Visual Information Fidelity (VIF), Structural Similarity (SSIM), and Correlation Coefficient (CC).
Produced high-quality fused images that maintain clear structural definitions.

Abstract

Infrared and visible image fusion is a critical technology for enhancing the all-weather perception capabilities of autonomous driving systems. However, the inherent physical parallax of vehicle-mounted sensors combined with motion-induced vibrations makes it difficult to achieve strict alignment between the source images. Direct fusion of such misaligned pairs leads to ghosting artifacts, which significantly compromises driving safety. To address this challenge, this paper proposes a cascaded deep fusion framework tailored for autonomous driving scenarios. A dual-modal perception dataset is first constructed, incorporating realistic physical parallax and non-rigid deformations. Subsequently, a decoupled strategy is established, characterized by geometric correction followed by semantic fusion: the Static-Feature Recursive Registration (SFRR) network is utilized to explicitly correct the spatial misalignments caused by parallax, thereby establishing geometric consistency; then, the Hierarchical Invertible Block Fusion (HIBF) network achieves lossless integration of cross-modal features by combining spatial frequency separation with invertible interaction techniques. Experimental results demonstrate that the proposed method outperforms representative algorithms across several metrics, including Mutual Information (MI), Visual Information Fidelity (VIF), Structural Similarity (SSIM), and Correlation Coefficient (CC), producing high-quality fused images with clear structural definitions.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper