Weakly supervised 3D object detection has emerged as a promising paradigm to reduce the reliance on costly 3D annotations. Existing methods often rely on 2D projection constraints or heuristic priors to supervise 3D box regression with inexpensive 2D labels. However, they still suffer from projection ambiguity and geometry inconsistency due to the entangled optimization of 3D parameters. In this paper, we propose DiMuS, a Disentangled Multi-Signal learning framework that integrates complementary supervision from 2D boxes, LLM-derived semantic prior, and 3D geometric alignment to enhance distinct 3D properties of position, dimension, and orientation, respectively. Specifically, DiMuS incorporates three key components: (i) a Centerness-enhanced Projection Constraint (CPC) that improves position estimation through a centerness weighting strategy, (ii) a Semantic Prior Anchoring (SPA) module that leverages LLM-derived category-specific priors for robust dimension decoding, and (iii) a Rotation-aware Consistency Regularization (RCR) mechanism that enforces orientation consistency through synthetic rotations and self-supervised invariance learning. Additionally, an Adversarial Geometric Alignment (AGA) module is proposed to build attraction/repulsion forces between LiDAR points and box edges for dynamic boundary refinement. Extensive experiments on the KITTI dataset demonstrate that DiMuS outperforms previous weakly supervised methods, achieving 96.82% of fully supervised performance on car detection while maintaining robustness across different categories.
Zhang et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: