What question did this study set out to answer?

May 22, 2026Open Access

Deep Reinforcement Learning with Instance-Invariant Baseline Regularization for Joint Retrieval and Relocation Scheduling in Multi-Deep Warehouses

Key Points

This work aims to minimize makespan in multi-deep automated vehicle storage and retrieval systems using deep reinforcement learning.
Developed a DRL framework with heterogeneous graph representation for decision-making.
Introduced Instance-Invariant Baseline Regularization to enhance policy learning stability.
Conducted extensive experiments across 64 unseen warehouse configurations.
The trained policy achieved a lower makespan across various warehouse scenarios.
Demonstrated strong generalization ability outpacing heuristic baselines.
Showed stable convergence across all tested configurations.

Abstract

• DRL framework for joint retrieval and relocation scheduling in multideep warehouses. • Heterogeneous graph representation captures diverse storage entity types. • Instance-Invariant Baseline Regularization enables stable policy learning. • A computationally e!cient lower bound is derived as the instanceinvariant baseline. • Trained policy achieves lower makespan across various unseen warehouse configurations. Multi-deep automated vehicle storage and retrieval systems (AVS/RS) offer high storage density, making them increasingly prevalent in modern logistics. However, their operational efficiency is often constrained by the need to relocate blocking items during retrieval. In this work, we consider a realistic scenario where only a subset of stored items is requested, and relocation naturally arises when target items are blocked by non-requested ones. We propose a deep reinforcement learning (DRL) framework for makespan minimization in multi-deep AVS/RS. The framework features a heterogeneous graph-based state representation that captures three distinct entity types (requested items, non-requested items, and empty locations) along with their structural relationships. The action space is designed to correspond to these node types, enabling the agent to handle both retrieval and relocation decisions within a unified framework. To address the high variance inherent in this problem, we propose the Instance-Invariant Baseline Regularization, which decouples the agent’s performance from the instance’s inherent complexity by deriving a computationally efficient lower bound for each state. Extensive experiments validate the effectiveness of the proposed approach. The agent trained with the proposed regularization demonstrates stable convergence and, more crucially, strong generalization across 64 unseen warehouse configurations of varying scale, consistently outperforming heuristic baselines. These results highlight the potential of DRL for intelligent decision-making in complex warehouse management problems.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper