In structured orchard environments, harvesting robots operate where rigid bodies (e.g., trunks, poles, and wires) coexist with flexible foliage. Strict avoidance of all obstacles significantly compromises operational efficiency. To address this, this study proposes an end-to-end autonomous harvesting framework characterized by an “avoid-rigid, push-through-soft” strategy. This framework explicitly propagates uncertainties from sensor data and reconstruction processes into the planning and policy phases. First, a multi-task perception network acquires 2D semantic masks of fruits and branches. Class probabilities and instance IDs are back-projected onto a 3D Gaussian Splatting (3DGS) representation to construct a decision-oriented, semantically enhanced 3D scene model. The policy network accepts multi-channel 3DGS rendered observations and proprioceptive states as inputs, outputting a continuous preference vector over eight predefined motion primitives. This approach unifies path planning and action decision-making within a single closed loop. Additionally, a dynamic action shielding module was designed to perform look-ahead collision risk assessments on candidate discrete actions. By employing an action mask to block actions potentially colliding with rigid obstacles, high-risk behaviors are effectively suppressed during both training and execution, thereby enhancing the robustness and reliability of robotic manipulation. The proposed method was validated in both simulation and real-world scenarios. In complex orchard scenarios, the proposed AE-TD3 algorithm achieves a harvesting success rate of 77.1%, outperforming existing RRT (53.3%), DQN (60.9%), and TD3 (63.8%) methods. Furthermore, the method demonstrates superior safety and real-time performance, with a collision rate reduced to 16.2% and an average operation time of only 12.4 s. Results indicate that the framework effectively supports efficient harvesting operations while ensuring safety.
Fu et al. (Mon,) studied this question.