Accurate segmentation of target fruits is essential for automated field management. However, the challenge lies in the fact that many fruits remain green for extended periods, closely resembling the colors of leaves and branches, thus making accurate identification difficult. While current multi-modal methods that utilize depth information can mitigate this problem, the high cost of equipment for acquiring such data limits the practical implementation of these techniques. To tackle this challenge, we introduce the monocular depth estimation technique Depth Anything V2 to fruit segmentation tasks, proposing a novel monocular depth-assisted instance segmentation framework, DepthCL-Seg. Within DepthCL-Seg, the Cross-modal Complementary Fusion (CCF) module effectively fuses RGB and depth information to enhance feature representation in low-contrast target regions. Additionally, a low-contrast adaptive refinement (LAR) module is designed to improve discrimination of easily confusable boundary pixels. Experimental results show that DepthCL-Seg achieves mAP scores of 74.2% and 86.0% on our self-constructed green fig and green peach datasets, respectively. These scores surpass the classical Mask R-CNN by 7.5% and 4.4%, and significantly outperform current mainstream methods. This framework provides novel technical support for automated management in fruit cultivation.
Shang et al. (Thu,) studied this question.