Traditional high-precision canopy segmentation heavily relies on tedious pixel-level manual annotation, while general-purpose zero-shot visual detection algorithms are prone to boundary adhesion and excessive computational load in dense forest areas. To address this, this study proposes a human–machine collaborative, efficient canopy segmentation and canopy cover inversion paradigm, combining the zero-shot pre-annotation capabilities of text-driven object detection with the high-precision segmentation advantages of the lightweight proprietary network LGBU-Net. In the offline annotation stage, this method automatically locates candidate canopy regions using Grounding DINO combined with text prompts and generates initial pixel-level masks using SAM. A high-quality training set is then constructed through minimal manual correction, significantly reducing the cost of traditional fully manual annotation. Subsequently, an improved LGBU-Net designed for complex forest conditions is used for supervised learning. In the feature extraction stage, a lightweight phantom-coordinate attention module (LG-CAM) is introduced to enhance the network’s focus on the geometric center of the tree canopy and suppress semantic interference caused by the forest background, light spots, and shadows. In the decoding stage, a boundary difference fusion module (BDF-Block) is deployed to alleviate the problem of adjacent tree canopy boundaries adhering by utilizing high-frequency gradient information from the underlying layers of UAV imagery. Combined with a boundary-aware hybrid loss function, the clarity of individual tree boundaries is further improved in the gradient domain. Experiments based on UAV imagery of high-density mixed and coniferous forests in Baishan, Jilin Province, show that, with low manual annotation costs, LGBU-Net achieves a canopy segmentation IoU of 90.45% and an individual tree separation F1 score of 89.35%, significantly outperforming general visual algorithms with zero-shot direct inference, and with only 4.85 M model parameters. Furthermore, the segmentation results are used for plot-level canopy vertical cover (CC) inversion, and the estimated values are highly consistent with ground-based measurements. This research provides a high-precision, low-annotation-cost technical solution with good edge deployment potential for large-scale forest resource surveys and forest understory light environment assessment.
Chen et al. (Mon,) studied this question.