March 3, 2026Open Access

From pixels to points: An AI framework with weaker-and-fewer-labels for lightweight 3D phenotyping using 2D-3D coordinate mapping and VLMs

Key Points

Real-time low-cost 3D phenotyping for tomato plants achieves high accuracy.
The YOLO11-segment model achieves a mean Average Precision (mAP₅₀) of 96.0% in plant height estimation.
Method uses 2D-3D coordinate mapping to simplify segmentation using only weak labels.
The approach significantly reduces annotation dependency and computational complexity for effective deployment.

Abstract

3D phenotyping of seedlings is crucial to tomato cultivation in greenhouse facilities. Current studies focus on high-quality point cloud reconstruction and artificial intelligence (AI) 3D segmentation to derive phenotypic traits like plant height and crown width, which heavily rely on manual annotation and possess high complexity in deployment. This study proposes a novel AI framework from pixels to points, for efficient 3D plant phenotyping of tomato seedlings. Through the integration of 2D-3D coordinate mapping and AI vision language models, the proposed method enables accurate reconstruction and analysis of 3D phenotypic traits from single-view data. Top-down RGB images and corresponding point clouds with spatial alignment are captured using a binocular camera. Vision language models are employed with the text prompt “plant” to automatically generate bounding boxes and masks, thereby minimizing manual annotation. These outputs are further transferred to a lightweight YOLO11-segment model. The core innovation is established in our 2D-3D mapping strategy, through which plant-specific 3D points are efficiently extracted using only 2D masks. Non-plant points within initial masks are repurposed to determine ground height for improved plant height estimation, while masks are refined using the Excess Green Index to enhance crown width measurement. An mAP₅₀ of 96.0% is achieved by the YOLO11-segment model. Concerning sparse canopy, highly accurate results are yielded by our phenotyping approach, with RMSE values of 1.7 cm for plant height and 1.0 cm for crown width, and R 2 values of 0.93 and 0.95 against manual measurements. For dense canopy, the usage of a reference chessboard improves the performance (RMSE was reduced from 9.57 cm to 2.07 cm). Annotation dependency is significantly reduced, computational complexity is decreased, edge deployment is supported, and efficient technology transfer is enabled by the presented method. Considerable potential is offered for high-throughput screening of elite tomato varieties with desirable agronomic traits. • Real-time low-cost 3D phenotyping of tomato plants is proposed. • Weak labels simplify the 3D plant segmentation. • Segment the 3D point cloud using 2D pixel-masks with spatial alignment. • Vision language models and knowledge transfer further simplify the AI application.

From pixels to points: An AI framework with weaker-and-fewer-labels for lightweight 3D phenotyping using 2D-3D coordinate mapping and VLMs

Key Points

Abstract

Cite This Study