Abstract Monocular 3D lane detection provides richer spatial information than 2D lane detection planar positioning results. It is crucial for enhancing vehicle perception in complex intelligent driving scenarios. Recent advances primarily model lanes in 3D space via anchor lines, project them onto front-viewed (FV) features for sampling, and directly regress 3D coordinates from 2D image features. However, the slender structural attributes of lanes pose significant challenges for accurate localization within 3D space. Existing frameworks struggle with effectively integrating multi-level features to capture global spatial structural relationships essential for detection accuracy and face difficulties in balancing detection performance with computational efficiency. To alleviate these problems, we present a novel CM-3DLane framework, an efficient 3D lane detector. Instead of directly superimposing deeper and lower-level features, we propose a strategy for multi-scale information integration that exploits a convolutional neural network (CNN) backbone for extracting local image features. We propose the Lane-Aware Mamba (LAMamba) block, which employs a tailored 2D selective scan (SS2D) strategy. This enables linear-complexity modeling of long-range spatial dependencies and global lane context, significantly enhancing feature extraction. This is complemented by a Cross-Scale Attention Fusion (CSAF) module that leverages channel and spatial attention mechanisms to effectively fuse multi-scale features. In addition, we design a Refined Anchor Dynamic Ranking (RADR) module to preserve the most representative and informative 3D anchors. CM-3DLane scores 58.3 F1 on OpenLane and 96.5 F1 on ApolloSim, leading all prior methods while maintaining high efficiency suitable for real-time deployment.
Yang et al. (Thu,) studied this question.