Vehicle Routing Problems are central to logistics and operational research, arising in diverse contexts such as transportation planning, manufacturing systems, and military operations. While Deep Reinforcement Learning has been successfully applied to both deterministic and stochastic variants of Vehicle Routing Problems, existing approaches often neglect critical time-sensitive conditions. This work addresses the Stochastic Capacitated Vehicle Routing Problem with Service Times and Deadlines, a challenging formulation that is suited to model time routing conditions. The proposal, POMO-DC, integrates a novel dynamic context mechanism. At each decision step, this mechanism incorporates the vehicle’s cumulative travel time and delays—features absent in prior models—enabling the policy to adapt to changing conditions and avoid time violations. The model is evaluated on stochastic instances with 20, 30, and 50 customers and benchmarked against Google OR-Tools using multiple metaheuristics. Results show that POMO-DC reduces average delays by up to 88% (from 169.63 to 20.35 min for instances of 30 customers) and 75% (from 4352.43 to 1098.97 min for instances of 50 customers), while maintaining competitive travel times. These outcomes highlight the potential of Deep Reinforcement Learning-based frameworks to learn patterns from stochastic data and effectively manage time uncertainty in Vehicle Routing Problems.
Marroquín-Cano et al. (Mon,) studied this question.