What question did this study set out to answer?

To develop a zero-shot framework for segmenting construction scene point clouds without task-specific training.

June 14, 2026Open Access

Human-inspired zero-shot semantic segmentation of construction point clouds based on adaptive observation and MLLM reasoning

Key Points

To develop a zero-shot framework for segmenting construction scene point clouds without task-specific training.
Proposed a modular architecture with action, observation, and analysis modules.
Utilized multimodal large language models for semantic interpretation.
Evaluated performance on real-world construction datasets.
Achieved over 85% weighted IoU, significantly exceeding traditional methods which scored less than 40% weighted IoU.
Demonstrated effective domain adaptation in 3D scene segmentation despite data scarcity.
Highlighted potential applications in progress monitoring, quality inspection, and human-robot collaboration.

Abstract

Semantic segmentation of as-built point cloud data is a cornerstone of construction digital twins. However, existing 3D AI models are often confined to specific, well-annotated datasets and suffer from poor generalization in complex construction environments. This paper proposes a human-inspired zero-shot framework for construction scene segmentation, featuring a modular architecture comprising: (1) an action module for hybrid pre-segmentation, (2) an observation module for dynamic viewpoint adjustment, and (3) an analysis module leveraging multimodal large language models (MLLMs) for semantic interpretation. Without task-specific training, the proposed method achieved over 85 % weighted IoU on real-world construction datasets, drastically outperforming state-of-the-art open-vocabulary 3D scene understanding methods (less than 40 % weighted IoU). This superior performance highlights the approach’s potential for downstream applications such as progress monitoring, quality inspection, and human-robot collaboration. Furthermore, it establishes a new paradigm for domain-adaptive 3D scene segmentation, effectively overcoming limitations related to data scarcity and cross-domain generalization.

Human-inspired zero-shot semantic segmentation of construction point clouds based on adaptive observation and MLLM reasoning

Key Points

Abstract

Cite This Study

Also Consider

Also Consider