What question did this study set out to answer?

This research aims to improve semantic segmentation of 3D point clouds by addressing issues of noise and unstructured data.

June 11, 2026Open Access

Structure-Aware Aggregation and Denoising for large-scale point cloud semantic segmentation

Key Points

This research aims to improve semantic segmentation of 3D point clouds by addressing issues of noise and unstructured data.
Integrated a structure-aware learning framework with local geometric modeling, denoising, and contextual reasoning.
Developed three core modules: SANA for local geometric encoding, SDIP for noise suppression, and OSAT for global attention modeling.
Conducted experiments on three benchmark datasets to evaluate performance.
Achieved 73.1% mIoU on S3DIS Area 5, 72.3% mIoU on ScanNetV2, and 83.8% mIoU on Toronto3D.
Ablation studies confirmed that each module effectively complements the overall performance.
The method demonstrates strong generalization across various 3D point cloud scenarios.

Abstract

To address the inherent challenges of 3D point cloud semantic segmentation including unstructured data distribution, inadequate capture of multi-scale geometric features, and susceptibility to noise, this paper presents a unified structure-aware learning framework integrating local geometric modeling, dynamic denoising, and global contextual reasoning. The framework comprises three complementary core modules: 1) The Structure-Aware Neighborhood Aggregation (SANA) module encodes fine-grained local geometric distributions using an octant-based structural descriptor, then fuses multi-scale features via multi-head aggregation to distinguish objects with similar scales but distinct geometries; 2) The Structure-Driven Irrelevant Point Denoising (SDIP) module dynamically suppresses noisy points and cross-semantic boundary interference by leveraging structural similarity, enhancing boundary clarity and aggregation robustness; 3) The Octant-Structure Aware Transformer (OSAT) module embeds structural priors into global self-attention, enabling semantic-geometric dual-factor guided long-range dependency modeling to mitigate over-smoothing in pure semantic attention. Extensive experiments are conducted on three benchmark datasets (S3DIS, ScanNetV2, Toronto3D) covering indoor and outdoor scenarios. Quantitative results show the proposed method achieves exceptional segmentation performance: 73.1% mIoU on S3DIS Area 5, 72.3% mIoU on ScanNetV2 test set, and 83.8% mIoU on Toronto3D. Ablation studies confirm the effectiveness and complementarity of each module, demonstrating the framework’s strong generalization and practical value for 3D point cloud understanding tasks.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper