This paper presents a method for the automatic generation of synthetic data and object detection datasets in virtual environments based on visual information from a tomato-harvesting robot. Labor shortages and population aging are pressing challenges in agricultural production. To address these challenges, information technology and robotics are required to improve efficiency and automate tasks. Fruit-harvesting robots rely on object detection to estimate the position and maturity of fruits. However, illumination variations, complex backgrounds, and occlusions in real environments reduce detection accuracy, and manual annotation remains labor-intensive. This study reconstructs 3D models of a cultivation scene using 3D Gaussian splatting and builds a virtual environment in Unreal Engine 5, which reproduces the camera view of the robot. In this environment, fruit positions, maturity classes, and illumination conditions are randomized to ensure diversity, and You Only Look Once (YOLO) format annotations are automatically exported from bounding boxes. The detection model trained with YOLO through transfer learning was subsequently used to evaluate the detection performance of images captured in a real environment. The proposed framework contributes to efficient data collection and dataset generation, thereby improving the adaptability of vision systems for agricultural robots.
Ushiroji et al. (Sun,) studied this question.