What question did this study set out to answer?

The study aims to create synthetic datasets for object detection to improve the efficiency of tomato-harvesting robots.

March 7, 2026Open Access

Automatic Generation of Synthetic Data and Object Detection Datasets in Virtual Environments Based on Tomato-Harvesting Robot Vision

Key Points

The study aims to create synthetic datasets for object detection to improve the efficiency of tomato-harvesting robots.
Developed a virtual environment using Unreal Engine 5.
Reconstructed 3D models of cultivation scenes with 3D Gaussian splatting.
Randomized fruit positions, maturity classes, and illumination conditions for diversity.
Automatically exported YOLO format annotations from bounding boxes.
Evaluated detection performance with a model trained through transfer learning.
Demonstrated improvements in detection accuracy under varied conditions.
Provided efficient data collection mechanisms.
Facilitated the training of detection models with synthesized data.

Abstract

This paper presents a method for the automatic generation of synthetic data and object detection datasets in virtual environments based on visual information from a tomato-harvesting robot. Labor shortages and population aging are pressing challenges in agricultural production. To address these challenges, information technology and robotics are required to improve efficiency and automate tasks. Fruit-harvesting robots rely on object detection to estimate the position and maturity of fruits. However, illumination variations, complex backgrounds, and occlusions in real environments reduce detection accuracy, and manual annotation remains labor-intensive. This study reconstructs 3D models of a cultivation scene using 3D Gaussian splatting and builds a virtual environment in Unreal Engine 5, which reproduces the camera view of the robot. In this environment, fruit positions, maturity classes, and illumination conditions are randomized to ensure diversity, and You Only Look Once (YOLO) format annotations are automatically exported from bounding boxes. The detection model trained with YOLO through transfer learning was subsequently used to evaluate the detection performance of images captured in a real environment. The proposed framework contributes to efficient data collection and dataset generation, thereby improving the adaptability of vision systems for agricultural robots.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Ushiroji et al. (Sun,) studied this question.

synapsesocial.com/papers/69abc1d75af8044f7a4eae61 https://doi.org/https://doi.org/10.1016/j.atech.2026.101947

Bookmark

View Full Paper