The multi-depot vehicle routing problem with inter-depot routes (MDVRP-IDR) represents a pivotal challenge in route optimization, especially in complex supply chain networks with geographically dispersed distribution hubs. With the recent breakthroughs in deep reinforcement learning (DRL) for addressing combinatorial optimization problems (COPs), this paper introduces a novel multi-agent DRL-based framework, termed SParse Attention encoDer and multi-decodEr (SPDE), designed to tackle this critical and complex variant of the vehicle routing problem. SPDE features a Transformer-style policy network, utilizing a sparse graph to model the connectivity between customers and depots. It employs a graph Transformer model for encoding and learning the relationships between the nodes in this graph. Additionally, an attention-based graph pooling technique is introduced to enable the policy model to effectively capture the graph-level structure of each problem instance with minimal computational overhead. To effectively construct vehicle routes, each beginning and ending at one of the depots, for multi-depot routing with inter-depot connections, a decoding module is proposed, where a dedicated decoder is assigned to each vehicle, acting as an agent in a multi-agent system. Through real-world traffic data from two major Canadian cities, Calgary and Edmonton, experimental evaluations demonstrate that SPDE outperforms state-of-the-art DRL-based and heuristic methods. It reduces travel times while demonstrating superior computational efficiency compared to traditional heuristics. Further experiments validate SPDE’s generalizability in effectively solving larger problem instances.
Mozhdehi et al. (Mon,) studied this question.