What type of study is this?

This is a Experimental Study study.

September 23, 2025Open Access

A Deep Reinforcement Learning Model to Solve the Stochastic Capacitated Vehicle Routing Problem with Service Times and Deadlines

Key Points

POMO-DC reduces average delays by up to 88% for instances with 30 customers, highlighting the model's efficiency.
With competitive travel times maintained, the model addresses critical time-sensitive conditions in routing.
Evaluated against benchmark methods, POMO-DC adapts to changing conditions with a dynamic context mechanism, enhancing decision-making.
The integration of deep reinforcement learning in this context shows promise for managing time uncertainty in vehicle routing.

Abstract

Vehicle Routing Problems are central to logistics and operational research, arising in diverse contexts such as transportation planning, manufacturing systems, and military operations. While Deep Reinforcement Learning has been successfully applied to both deterministic and stochastic variants of Vehicle Routing Problems, existing approaches often neglect critical time-sensitive conditions. This work addresses the Stochastic Capacitated Vehicle Routing Problem with Service Times and Deadlines, a challenging formulation that is suited to model time routing conditions. The proposal, POMO-DC, integrates a novel dynamic context mechanism. At each decision step, this mechanism incorporates the vehicle’s cumulative travel time and delays—features absent in prior models—enabling the policy to adapt to changing conditions and avoid time violations. The model is evaluated on stochastic instances with 20, 30, and 50 customers and benchmarked against Google OR-Tools using multiple metaheuristics. Results show that POMO-DC reduces average delays by up to 88% (from 169.63 to 20.35 min for instances of 30 customers) and 75% (from 4352.43 to 1098.97 min for instances of 50 customers), while maintaining competitive travel times. These outcomes highlight the potential of Deep Reinforcement Learning-based frameworks to learn patterns from stochastic data and effectively manage time uncertainty in Vehicle Routing Problems.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper