What question did this study set out to answer?

The aim is to enhance 3D reconstruction from construction site imagery by addressing occluders and dynamic disturbances.

July 3, 2026Open Access

Occluder-Mask-Constrained 3D Reconstruction from Tower-Crane Construction Site Imagery

Key Points

The aim is to enhance 3D reconstruction from construction site imagery by addressing occluders and dynamic disturbances.
Developed a framework utilizing occluder masks generated by Segment Anything Model 2.
Implemented Adaptive Patch Deformation Multi-View Stereo matching-cost evaluation.
Evaluated against various reconstruction methods using metrics from real construction-site datasets.
Achieved increased registration completeness in tested sequences with a representative 308-image analysis.
Demonstrated fewer visually identifiable artifacts in dense point clouds compared to no-mask reconstruction.
Showed improved performance metrics when compared to YOLOv8 and other baseline approaches.

Abstract

3D reconstruction of construction scenes is an important enabling technology for digital and intelligent construction project management. Recurring foreground occluders and dynamic disturbances in tower-crane imagery can destabilize image registration and introduce spurious depth responses. This paper proposes an occluder-mask-constrained 3D reconstruction framework driven by multi-view geometric anomalies. Adjacent-view geometric outliers are spatially aggregated to generate foreground prompt points, which are converted into occluder masks using Segment Anything Model 2 (SAM2). The masks are propagated as unified pixel-validity constraints through sparse feature filtering, Adaptive Patch Deformation Multi-View Stereo (APD-MVS) matching-cost evaluation, support-region selection, and depth-map fusion. Experiments on three real construction-site datasets show increased sparse-registration completeness in the tested sequences and fewer visually identifiable occluder-induced artifacts in dense point clouds. A representative 308-image sequence was further evaluated against no-mask reconstruction, You Only Look Once version 8 (YOLOv8) bounding-box removal, manually prompted Segment Anything Model 2.1 (SAM2.1), a Segment Anything Model 3 (SAM3) text-prompt baseline, and Visibility-Aware Multi-View Stereo Network (Vis-MVSNet). The evaluation combines sparse-reconstruction metrics, pixel-level mask-quality metrics from a manually annotated validation subset, module-wise runtime accounting, controlled ablations, and aligned dense-point-cloud visualization. These results show improved sparse-stage registration completeness and visible artifact suppression. Because high-precision 3D reference point clouds are unavailable, the dense results are interpreted as visual evidence of artifact suppression rather than as proof of improved absolute dense-reconstruction accuracy.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper