June 1, 2022

CAT-Det: Contrastively Augmented Transformer for Multimodal 3D Object Detection

Key Points

Key points are not available for this paper at this time.

Abstract

In autonomous driving, LiDAR point-clouds and RGB images are two major data modalities with complementary cues for 3D object detection. However, it is quite difficult to sufficiently use them, due to large inter-modal discrepancies. To address this issue, we propose a novel framework, namely Contrastively Augmented Transformer for multi-modal 3D object Detection (CAT-Det). Specifically, CAT-Det adopts a two-stream structure consisting of a Pointformer (PT) branch, an Imageformer (IT) branch along with a Cross-Modal Transformer (CMT) module. PT, IT and CMT jointly encode intra-modal and inter-modal long-range contexts for representing an object, thus fully exploring multi-modal information for detection. Furthermore, we propose an effective One-way Multimodal Data Augmentation (OMDA) approach via hierarchical contrastive learning at both the point and object levels, significantly improving the accuracy only by augmenting point-clouds, which is free from complex generation of paired samples of the two modalities. Extensive experiments on the KITTI benchmark show that CAT-Det achieves a new state-of-the-art, highlighting its effectiveness.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yanan Zhang

Jiaxin Chen

Di Huang

Actions

Institutions

Beihang University

State Key Laboratory of Software Development Environment

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

CAT-Det: Contrastively Augmented Transformer for Multimodal 3D Object Detection

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study