Abstract Deep learning methods such as Mask R‑CNN enable the precise delineation of single-tree crowns from remote sensing data. However, their segmentation performance still depends on local stand conditions. Using UAV multispectral imagery and lidar canopy height models (CHM), we assessed the influence of tree species composition, stand density, and foliage condition on the robustness of deep-learning-based single-tree segmentation. High-resolution laser data and multispectral data were collected over several hectares of forest area (Bavarian Forest National Park; DBU Natural Heritage, Schönau Foundation; Black Forest National Park, Kinzigtal) using a DJI 600 Pro drone. The Fraunhofer Lightweight Airborne Profiler collected a multispectral point cloud using a 905-nm laser and two integrated RGB cameras with 4112 × 3008 pixels. Another multispectral camera captured RGB imagery with 4112 × 3008 pixels and two monochrome bands (725 nm RE, 850 nm NIR; 2164 × 2056 pixels each). Flights were conducted at 80 m altitude with ≥ 50% lateral overlap, resulting in an average point density of 150 points/m 2 . Different models were trained and validated using multispectral images (RGB, CIR), images derived from the CHM, and images fused from the CHM and two near-infrared channels (RE, NIR). Highly accurate tree positions and manually processed tree segments were available for accuracy analyses. When the best-performing CHM channel combination was used, the average F1 scores across the three study areas were 70% (range: 36–100%). In the Bavarian Forest, the highest F1 score was 82%, surpassing that obtained with baseline methods by up to 39%. In the Black Forest, the highest F1 score was 85%, but it was > 50% lower in complex deciduous plots. The gains in the DBU Schönau Foundation area were smaller, with F1 scores up to 90%, about 20% above baseline. Multispectral channel combinations, such as CIR or CHM with two IR bands (725 nm, 850 nm), contributed only marginally to tree segmentation. The accuracy in coniferous areas reached 81%, which was about 20% higher than in deciduous stands, although this was influenced by the high stem densities (1000 stems per hectare). In a single reference plot, the results improved substantially under leaf-off conditions. Polygons from delineated tree crowns were of significantly better quality than those from baseline methods. Overall, this study demonstrated the superiority of deep-learning-based tree segmentation in complex, dense forest structures.
Krzystek et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: