ABSTRACT With rising customer expectations and increasing computational potential, many transport, manufacturing, and production operations face real‐time decision making in stochastic dynamic environments. Decision makers must find and adapt complex plans that are effective now but also flexible with respect to future developments. The challenges of searching a high‐dimensional constrained decision space for effective and flexible decisions are reflected in the three parts of the Bellman equation: the reward function, the value function, and the decision space. In the literature, reinforcement learning (RL) has shown potential to quickly evaluate the reward‐ and value function for a limited number of decisions but struggles to search a constrained decision space present in most planning problems. The question of how to combine the thorough search of the complex decision space with RL‐evaluation techniques is still open. We propose two RL‐based solution methods and detail a third one to search for and evaluate decisions in an integrated manner. Each method is inspired by one component of the Bellman equation. The first two methods dynamically shape the reward function or decision space to encourage effective and flexible decisions or prohibit inflexible decisions. The third method models the Bellman equation as a mixed‐integer linear programming formulation in which the value function is approximated by a neural network. We compare our proposed solution methods in a structured analysis for carefully designed problem classes. We demonstrate the effectiveness of our methods compared to prominent benchmark methods and highlight how the methods' performances depend not only on the problem classes but also on the instances' parameterizations.
Hildebrandt et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: