What type of study is this?

September 10, 2025Open Access

MBRNet: A Multi-View Feature Fusion Mamba Network for LiDAR-Based Place Recognition

Key Points

MBRNet achieved an F1max of 0.988 on the KITTI dataset, marking a significant improvement in recognition accuracy.
The integration of multi-view feature fusion and the Mamba model enhances effective place recognition and improves spatial representation.
Utilizing range image views and bird's-eye views, MBRNet captures comprehensive spatial features for reliable localization.
The approach requires only 13.49 milliseconds for descriptor generation, showcasing its efficiency in real-time applications.

Abstract

Abstract LiDAR-based place recognition(LPR) is a crucial technology for autonomous driving to achieve reliable localisation in GPS-denied environments. LPR achieves place recognition and localisation by querying for nearest neighbours in the database. Existing deep learning-based LPR approaches typically adopt various point cloud representations as inputs and extract features using convolutional neural networks (CNNs) or Transformer architectures. Recently, the Mamba model, which integrates deep learning with the state space model (SSM), has shown remarkable potential in modelling long-range dependencies. Motivated by this, we propose MBRNet, a novel multi-view feature fusion network that, for the first time, incorporates the Mamba architecture into LiDAR-based multi-view feature extraction and fusion. Specifically, MBRNet takes range image views (RIVs) and bird's-eye views (BEVs) as input, providing complementary spatial representations from different perspectives. The Multi-Scale Feature Extraction Mamba (MSfe-Mamba) module captures spatial features at multiple scales. Subsequently, semantic information from both views is aligned and fused through a Mamba Fusion module. This fusion process generates a robust and discriminative global descriptor, forming the basis for accurate place recognition. We conduct extensive experiments on the public KITTI dataset and our self-collected HUE dataset, covering over 4.2 km of urban road scenarios. Specifically, our method achieved an F1max of 0.988 on the KITTI dataset and attained the highest AUC value of 0.895 on the HUE dataset. Moreover, the descriptor generation required only 13.49 milliseconds, highlighting our approach's computational efficiency advantage. Experimental results demonstrate that MBRNet outperforms state-of-the-art methods in recognition accuracy, viewpoint robustness, and runtime efficiency, showcasing its strong potential for reliable place recognition in complex real-world environments.

Read Full Paperexternally

Perguntar à IA

Bookmark

View Full Paper