Accurate individual tree crown (ITC) segmentation from unmanned aerial vehicle (UAV) imagery is important for fine-scale forest inventory, plantation management, and ecological monitoring. However, delineating ITCs in dense plantation environments remains difficult because crowns are strongly adjacent, canopy structures are highly homogeneous, and crown boundaries are often blurred, making it hard for existing methods to preserve both regional integrity and boundary continuity. This study proposes the Perceptual Segment-Anything Model with Multi-head Cross-Parallel Attention (Per-SAM-MCPA), a lightweight and effective framework for fine-grained ITC segmentation in dense plantation scenes. Based on a compact ResNet-50 backbone, the framework integrates perceptual target-aware representation, multi-scale detail enhancement, global contextual modeling, and semantic-boundary collaborative refinement to improve crown discrimination and structural consistency. A perceptual relation module is used to strengthen pixel-level semantic dependency modeling, and a Multi-head Cross-Parallel Attention (MCPA) mechanism is designed to capture long-range contextual interactions through orthogonally decomposed spatial attention, improving global geometric consistency with limited computational overhead. A Composite Constraint Loss (CCL) that combines a weighted cross-entropy loss, a structural similarity loss, and a boundary term based on Hausdorff distance is introduced to jointly optimize region-level segmentation quality and boundary fidelity. Experiments on the Catalpa bungei UAV dataset show that the proposed method achieves an intersection over union (IoU) of 87.3% and an F1-score of 91.0%, outperforming representative baseline methods such as SAM and Mask R-CNN while maintaining an inference speed of 35.7 FPS on a single GPU. These results indicate that Per-SAM-MCPA offers an accurate, efficient, and practical solution for ITC segmentation in dense plantation environments.
Hu et al. (Wed,) studied this question.