3D reconstruction of construction scenes is an important enabling technology for digital and intelligent construction project management. Recurring foreground occluders and dynamic disturbances in tower-crane imagery can destabilize image registration and introduce spurious depth responses. This paper proposes an occluder-mask-constrained 3D reconstruction framework driven by multi-view geometric anomalies. Adjacent-view geometric outliers are spatially aggregated to generate foreground prompt points, which are converted into occluder masks using Segment Anything Model 2 (SAM2). The masks are propagated as unified pixel-validity constraints through sparse feature filtering, Adaptive Patch Deformation Multi-View Stereo (APD-MVS) matching-cost evaluation, support-region selection, and depth-map fusion. Experiments on three real construction-site datasets show increased sparse-registration completeness in the tested sequences and fewer visually identifiable occluder-induced artifacts in dense point clouds. A representative 308-image sequence was further evaluated against no-mask reconstruction, You Only Look Once version 8 (YOLOv8) bounding-box removal, manually prompted Segment Anything Model 2.1 (SAM2.1), a Segment Anything Model 3 (SAM3) text-prompt baseline, and Visibility-Aware Multi-View Stereo Network (Vis-MVSNet). The evaluation combines sparse-reconstruction metrics, pixel-level mask-quality metrics from a manually annotated validation subset, module-wise runtime accounting, controlled ablations, and aligned dense-point-cloud visualization. These results show improved sparse-stage registration completeness and visible artifact suppression. Because high-precision 3D reference point clouds are unavailable, the dense results are interpreted as visual evidence of artifact suppression rather than as proof of improved absolute dense-reconstruction accuracy.
He et al. (Wed,) studied this question.