To address the inherent challenges of 3D point cloud semantic segmentation including unstructured data distribution, inadequate capture of multi-scale geometric features, and susceptibility to noise, this paper presents a unified structure-aware learning framework integrating local geometric modeling, dynamic denoising, and global contextual reasoning. The framework comprises three complementary core modules: 1) The Structure-Aware Neighborhood Aggregation (SANA) module encodes fine-grained local geometric distributions using an octant-based structural descriptor, then fuses multi-scale features via multi-head aggregation to distinguish objects with similar scales but distinct geometries; 2) The Structure-Driven Irrelevant Point Denoising (SDIP) module dynamically suppresses noisy points and cross-semantic boundary interference by leveraging structural similarity, enhancing boundary clarity and aggregation robustness; 3) The Octant-Structure Aware Transformer (OSAT) module embeds structural priors into global self-attention, enabling semantic-geometric dual-factor guided long-range dependency modeling to mitigate over-smoothing in pure semantic attention. Extensive experiments are conducted on three benchmark datasets (S3DIS, ScanNetV2, Toronto3D) covering indoor and outdoor scenarios. Quantitative results show the proposed method achieves exceptional segmentation performance: 73.1% mIoU on S3DIS Area 5, 72.3% mIoU on ScanNetV2 test set, and 83.8% mIoU on Toronto3D. Ablation studies confirm the effectiveness and complementarity of each module, demonstrating the framework’s strong generalization and practical value for 3D point cloud understanding tasks.
Chen et al. (Mon,) studied this question.