Object detection and object tracking constitute core tasks in computer vision, aimed at identifying and localizing objects belonging to predefined categories within a scene. In recent years, the advent of the Mamba architecture has attained a significant milestone in deep learning. By harnessing State Space Models (SSMs), Mamba achieves linear computational complexity and superior long-range dependency modeling, in contrast to the quadratic complexity inherent in traditional Transformer architectures. Consequently, a growing body of researchers are applying Mamba to the domain of three-dimensional (3D) point clouds to improve processing efficiency. However, owing to the intrinsic sparsity, irregularity, and unstructured characteristics of point cloud data, the direct application of 1D sequential models to 3D spatial data confronts substantial challenges, particularly regarding data serialization and local feature preservation. To help researchers gain a comprehensive understanding of the current status and latest advancements in this field, this paper systematically reviews the recent research progress in 3D point cloud algorithms based on the Mamba architecture. Furthermore, this survey analyzes existing limitations regarding geometric information loss and interpretability. It concludes by delineating potential future research directions, such as learnable serialization strategies and hybrid architectures, aiming to provide a foundational reference for developing next-generation, efficient 3D perception systems.
Zihao Li (Thu,) studied this question.