What question did this study set out to answer?

The aim is to enhance the accuracy of fruit segmentation using monocular depth information.

January 25, 2026Open Access

DepthCL-Seg: Dual-Stream Feature Fusion for Green Fruit Instance Segmentation Based on Monocular Depth

Puntos clave

The aim is to enhance the accuracy of fruit segmentation using monocular depth information.
Introduced DepthCL-Seg framework for fruit segmentation.
Utilized Depth Anything V2 for monocular depth estimation.
Employed Cross-modal Complementary Fusion to merge RGB and depth information.
Developed a low-contrast adaptive refinement module to clarify boundary pixels.
Achieved mAP scores of 74.2% and 86.0% for green fig and peach datasets respectively.
Surpassed Mask R-CNN's performance by 7.5% and 4.4%.
Demonstrated superior results compared to mainstream techniques.

Resumen

Accurate segmentation of target fruits is essential for automated field management. However, the challenge lies in the fact that many fruits remain green for extended periods, closely resembling the colors of leaves and branches, thus making accurate identification difficult. While current multi-modal methods that utilize depth information can mitigate this problem, the high cost of equipment for acquiring such data limits the practical implementation of these techniques. To tackle this challenge, we introduce the monocular depth estimation technique Depth Anything V2 to fruit segmentation tasks, proposing a novel monocular depth-assisted instance segmentation framework, DepthCL-Seg. Within DepthCL-Seg, the Cross-modal Complementary Fusion (CCF) module effectively fuses RGB and depth information to enhance feature representation in low-contrast target regions. Additionally, a low-contrast adaptive refinement (LAR) module is designed to improve discrimination of easily confusable boundary pixels. Experimental results show that DepthCL-Seg achieves mAP scores of 74.2% and 86.0% on our self-constructed green fig and green peach datasets, respectively. These scores surpass the classical Mask R-CNN by 7.5% and 4.4%, and significantly outperform current mainstream methods. This framework provides novel technical support for automated management in fruit cultivation.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo