While recent 3D object detection methods have achieved impressive overall performance, they generally struggle to maintain high precision for small and distant targets. On the one hand, existing 3D backbones often lack sufficient multi-scale and channel-wise discrimination, causing rich features of large proximal objects to overwhelm the faint signals of small distant targets during downsampling. On the other hand, although current Mamba-based architectures achieve high performance with linear computational complexity, their reliance on unidirectional scanning ignores bidirectional spatial relationships, leading to geometric distortion and the neglect of critical features for rare voxels. To address these issues, drawing inspiration from the mechanism of Gated Attention for adaptive feature modulation, we propose GateMamba, a novel 3D backbone composed of stacked GateMamba blocks equipped with diverse feature gated mixers. To mitigate the loss of fine-grained spatial details during hierarchical downsampling, we design the GateMamba block, which incorporates a dense feature pyramid structure composed of several nested GateMamba layers and a scale feature gated mixer to adaptively weight and aggregate multi-scale features. To address the spatial distortion caused by the unidirectional scanning of standard state space models, we introduce a spatial-channel feature gated mixer within the GateMamba layer to bidirectionally aggregate spatial contexts and recalibrate channel responses. To prevent the feature vanishing of sparse instances during strided downsampling operations, we proposed the dilation voxel generation strategy, which proactively synthesizes features for foreground placeholders aligned with the sampling stride. Extensive experiments on the KITTI, Waymo, ONCE and NuScenes datasets demonstrate that GateMamba outperforms current state-of-the-art methods, such as achieving 73.8% and 67.5% mAP on the KITTI and ONCE datasets, respectively. More importantly, GateMamba specifically improves the detection precision of small and distant objects, such as outperforming the baseline by 2.5% and 2.4% in Level 1 and Level 2 mAP for the cyclist category on the Waymo validation dataset. Ablation studies further validate the contributions of diverse feature gated mixers in enhancing feature discriminabilities.
Liu et al. (Tue,) studied this question.