March 3, 2026Open Access

SPADE: Solving the Multi-Depot Vehicle Routing Problem with Inter-Depot Routes Using Multi-Agent Deep Reinforcement Learning

Key Points

SPDE significantly reduces travel times when optimizing vehicle routes in multi-depot scenarios, indicating a notable advancement in logistics.
The framework demonstrates superior performance to existing deep reinforcement learning and heuristic methods, illustrating its effectiveness in practical applications.
Analysis uses real-world traffic data from Calgary and Edmonton, enhancing relevance and applicability to actual supply chain challenges.
SPDE's approach allows efficient handling of larger problem instances, showcasing its potential for broader use in route optimization.

Abstract

The multi-depot vehicle routing problem with inter-depot routes (MDVRP-IDR) represents a pivotal challenge in route optimization, especially in complex supply chain networks with geographically dispersed distribution hubs. With the recent breakthroughs in deep reinforcement learning (DRL) for addressing combinatorial optimization problems (COPs), this paper introduces a novel multi-agent DRL-based framework, termed SParse Attention encoDer and multi-decodEr (SPDE), designed to tackle this critical and complex variant of the vehicle routing problem. SPDE features a Transformer-style policy network, utilizing a sparse graph to model the connectivity between customers and depots. It employs a graph Transformer model for encoding and learning the relationships between the nodes in this graph. Additionally, an attention-based graph pooling technique is introduced to enable the policy model to effectively capture the graph-level structure of each problem instance with minimal computational overhead. To effectively construct vehicle routes, each beginning and ending at one of the depots, for multi-depot routing with inter-depot connections, a decoding module is proposed, where a dedicated decoder is assigned to each vehicle, acting as an agent in a multi-agent system. Through real-world traffic data from two major Canadian cities, Calgary and Edmonton, experimental evaluations demonstrate that SPDE outperforms state-of-the-art DRL-based and heuristic methods. It reduces travel times while demonstrating superior computational efficiency compared to traditional heuristics. Further experiments validate SPDE’s generalizability in effectively solving larger problem instances.

Bookmark

View Full Paper

Bookmark

View Full Paper

SPADE: Solving the Multi-Depot Vehicle Routing Problem with Inter-Depot Routes Using Multi-Agent Deep Reinforcement Learning

Key Points

Abstract

Cite This Study