What question did this study set out to answer?

This research aims to enhance decision-making processes in stochastic dynamic environments using reinforcement learning techniques.

May 4, 2026Open Access

Shaping Decision Models for Stochastic Dynamic Optimization Problems via Reinforcement Learning

Key Points

This research aims to enhance decision-making processes in stochastic dynamic environments using reinforcement learning techniques.
Proposed two reinforcement learning-based methods that dynamically shape the reward function and decision space.
Developed a third method using mixed-integer linear programming with a neural network for value function approximation.
Conducted structured analysis comparing proposed methods to benchmark methods across various problem classes.
Demonstrated the effectiveness of the proposed methods in finding flexible and effective decisions.
Showed performance variation depending on problem classes and parameterization of instances.

Abstract

ABSTRACT With rising customer expectations and increasing computational potential, many transport, manufacturing, and production operations face real‐time decision making in stochastic dynamic environments. Decision makers must find and adapt complex plans that are effective now but also flexible with respect to future developments. The challenges of searching a high‐dimensional constrained decision space for effective and flexible decisions are reflected in the three parts of the Bellman equation: the reward function, the value function, and the decision space. In the literature, reinforcement learning (RL) has shown potential to quickly evaluate the reward‐ and value function for a limited number of decisions but struggles to search a constrained decision space present in most planning problems. The question of how to combine the thorough search of the complex decision space with RL‐evaluation techniques is still open. We propose two RL‐based solution methods and detail a third one to search for and evaluate decisions in an integrated manner. Each method is inspired by one component of the Bellman equation. The first two methods dynamically shape the reward function or decision space to encourage effective and flexible decisions or prohibit inflexible decisions. The third method models the Bellman equation as a mixed‐integer linear programming formulation in which the value function is approximated by a neural network. We compare our proposed solution methods in a structured analysis for carefully designed problem classes. We demonstrate the effectiveness of our methods compared to prominent benchmark methods and highlight how the methods' performances depend not only on the problem classes but also on the instances' parameterizations.

Shaping Decision Models for Stochastic Dynamic Optimization Problems via Reinforcement Learning

Key Points

Abstract

Cite This Study

Also Consider

Also Consider