Semantic segmentation of as-built point cloud data is a cornerstone of construction digital twins. However, existing 3D AI models are often confined to specific, well-annotated datasets and suffer from poor generalization in complex construction environments. This paper proposes a human-inspired zero-shot framework for construction scene segmentation, featuring a modular architecture comprising: (1) an action module for hybrid pre-segmentation, (2) an observation module for dynamic viewpoint adjustment, and (3) an analysis module leveraging multimodal large language models (MLLMs) for semantic interpretation. Without task-specific training, the proposed method achieved over 85 % weighted IoU on real-world construction datasets, drastically outperforming state-of-the-art open-vocabulary 3D scene understanding methods (less than 40 % weighted IoU). This superior performance highlights the approach’s potential for downstream applications such as progress monitoring, quality inspection, and human-robot collaboration. Furthermore, it establishes a new paradigm for domain-adaptive 3D scene segmentation, effectively overcoming limitations related to data scarcity and cross-domain generalization.
Wang et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: