What question did this study set out to answer?

The research aims to enhance warehouse dispatch systems through a framework for multi-agent cooperation.

June 23, 2026Open Access

Enduring Heterogeneous Cooperative Embodied-agent Control: Long-sequence Task Collaboration via Reinforcement Fine-tuning

Key Points

The research aims to enhance warehouse dispatch systems through a framework for multi-agent cooperation.
Developed a heterogeneous multi-agent system with quadruped robots and a wheeled dual-arm manipulator.
Formulated task collaboration as a multi-agent Markov decision process using imitation learning.
Employed reinforcement learning fine-tuning with reward shaping, curriculum learning, and safety filtering.
Achieved a success rate of 77.7% and categorization accuracy of 81.9%.
Attained a throughput of 1.10 parcels per minute and a non-collision rate of 82.6%.
Right-dispatch rate reached 84.7%, with clear advantages over the baseline.

Abstract

Modern warehouse dispatch systems require high throughput and flexibility while maintaining safe and reliable operations. However, traditional conveyor-based automation and state machine-based control methods struggle to handle fragile-item sorting and dynamic layouts, often resulting in dispatch errors during long-term operation. To address these challenges, this paper proposes a heterogeneous embodied multi-agent cooperation framework for parcel dispatching and sorting tasks. The proposed system integrates two quadruped robots and one wheeled dual-arm mobile manipulator, and formulates long-horizon collaborative dispatching as an observable multi-agent Markov decision process initialized through imitation learning. A reinforcement learning fine-tuning framework is developed by incorporating reward shaping, curriculum learning, and safety filtering, while Variance-Suppressed Policy Optimization (VSPO) and Reinforced Advantage Decision (ReAd) mechanisms are introduced to improve policy optimization efficiency, coordination stability and decision reliability. Experimental results demonstrate that the proposed method achieves a success rate of 77.7%, a categorization accuracy of 81.9%, and a throughput of 1.10 parcels per minute across L1 and L2 scenarios, while achieving a non-collision rate of 82.6% and a right-dispatch rate of 84.7%. Comparative and ablation analyses further show clear advantages over the baseline. Toward the next-generation logistics scenarios, these results demonstrate the potential of the proposed heterogeneous multi robot system in handling complex and long-horizon collaboration.

KI fragen

Bookmark

View Full Paper