Abstract LiDAR-based place recognition(LPR) is a crucial technology for autonomous driving to achieve reliable localisation in GPS-denied environments. LPR achieves place recognition and localisation by querying for nearest neighbours in the database. Existing deep learning-based LPR approaches typically adopt various point cloud representations as inputs and extract features using convolutional neural networks (CNNs) or Transformer architectures. Recently, the Mamba model, which integrates deep learning with the state space model (SSM), has shown remarkable potential in modelling long-range dependencies. Motivated by this, we propose MBRNet, a novel multi-view feature fusion network that, for the first time, incorporates the Mamba architecture into LiDAR-based multi-view feature extraction and fusion. Specifically, MBRNet takes range image views (RIVs) and bird's-eye views (BEVs) as input, providing complementary spatial representations from different perspectives. The Multi-Scale Feature Extraction Mamba (MSfe-Mamba) module captures spatial features at multiple scales. Subsequently, semantic information from both views is aligned and fused through a Mamba Fusion module. This fusion process generates a robust and discriminative global descriptor, forming the basis for accurate place recognition. We conduct extensive experiments on the public KITTI dataset and our self-collected HUE dataset, covering over 4.2 km of urban road scenarios. Specifically, our method achieved an F1max of 0.988 on the KITTI dataset and attained the highest AUC value of 0.895 on the HUE dataset. Moreover, the descriptor generation required only 13.49 milliseconds, highlighting our approach's computational efficiency advantage. Experimental results demonstrate that MBRNet outperforms state-of-the-art methods in recognition accuracy, viewpoint robustness, and runtime efficiency, showcasing its strong potential for reliable place recognition in complex real-world environments.
Sun et al. (Tue,) studied this question.