What question did this study set out to answer?

The research aims to improve harvesting efficiency and safety in robotic systems operating in orchards by combining obstacle avoidance strategies with deep reinforcement learning.

March 18, 2026Open Access

Push-or-Avoid: Deep Reinforcement Learning of Obstacle-Aware Harvesting for Orchard Robots

Key Points

The research aims to improve harvesting efficiency and safety in robotic systems operating in orchards by combining obstacle avoidance strategies with deep reinforcement learning.
Developed an end-to-end autonomous harvesting framework with a push-through-soft strategy.
Utilized a multi-task perception network for 2D semantic masks of fruits and branches.
Created a decision-oriented 3D scene model using 3D Gaussian Splatting.
Implemented a policy network for continuous motion preferences based on 3D observations and proprioceptive states.
Designed a dynamic action shielding module for assessing collision risks during action selection.
Achieved a harvesting success rate of 77.1% using the AE-TD3 algorithm.
Outperformed RRT (53.3%), DQN (60.9%), and TD3 (63.8%) methods in harvesting efficiency.
Reduced collision rate to 16.2% while maintaining a swift average operation time of 12.4 seconds.

Abstract

In structured orchard environments, harvesting robots operate where rigid bodies (e.g., trunks, poles, and wires) coexist with flexible foliage. Strict avoidance of all obstacles significantly compromises operational efficiency. To address this, this study proposes an end-to-end autonomous harvesting framework characterized by an “avoid-rigid, push-through-soft” strategy. This framework explicitly propagates uncertainties from sensor data and reconstruction processes into the planning and policy phases. First, a multi-task perception network acquires 2D semantic masks of fruits and branches. Class probabilities and instance IDs are back-projected onto a 3D Gaussian Splatting (3DGS) representation to construct a decision-oriented, semantically enhanced 3D scene model. The policy network accepts multi-channel 3DGS rendered observations and proprioceptive states as inputs, outputting a continuous preference vector over eight predefined motion primitives. This approach unifies path planning and action decision-making within a single closed loop. Additionally, a dynamic action shielding module was designed to perform look-ahead collision risk assessments on candidate discrete actions. By employing an action mask to block actions potentially colliding with rigid obstacles, high-risk behaviors are effectively suppressed during both training and execution, thereby enhancing the robustness and reliability of robotic manipulation. The proposed method was validated in both simulation and real-world scenarios. In complex orchard scenarios, the proposed AE-TD3 algorithm achieves a harvesting success rate of 77.1%, outperforming existing RRT (53.3%), DQN (60.9%), and TD3 (63.8%) methods. Furthermore, the method demonstrates superior safety and real-time performance, with a collision rate reduced to 16.2% and an average operation time of only 12.4 s. Results indicate that the framework effectively supports efficient harvesting operations while ensuring safety.

Push-or-Avoid: Deep Reinforcement Learning of Obstacle-Aware Harvesting for Orchard Robots

Key Points

Abstract

Cite This Study