April 15, 2024

Multimodal Pedestrian Detection based on Cross-Modality Reference Search

Key Points

Key points are not available for this paper at this time.

Abstract

Pedestrian detection in thermal and visible images is crucial for various applications, such as surveillance, driver assistance, and autonomous driving. In this paper, we propose a novel fusion scheme that effectively integrates multimodal features to improve detection performance. Our approach relies on Cross-Modality Reference Module (CMRM) for exchanging complementary features extracted from different modalities, solving incorrect sensor dominance problem in rare untrained-for contexts. We also utilize modality-specific region proposal networks to explore potential candidates separately in each modality, ensuring accurate and reliable proposals. The fusion of region proposals is performed using the Multimodal Fusion Module (MFM) that employs an attention mechanism to combine features based on their attention scores. To improve the robustness of the model in practical scenarios, we introduce a group of new data augmentation techniques, which simulate real-world challenges. Experimental evaluations conducted on the public KAIST, CVC-14, and FLIR datasets demonstrate the effectiveness of our proposed method. The results show that our fusion scheme significantly outperforms the existing methods in terms of detection performance by as much as 16.4% in practical scenarios.

Demander à l'IA

Bookmark