Key points are not available for this paper at this time.
In the realm of stochastic control, particularly in the fields of economics and engineering, Markov Decision Processes (MDP's) are employed to represent various processes ranging from asset management to transportation logistics. Upon closer examination these constrained MDP's often exhibit specific causal structures concerning the dynamics of transitions and rewards. Thus, leveraging this structure can facilitate computational simplifications for determining the optimal policy. This study introduces a framework, which we denote as SD-MDP, in which we disentangle the causal structure of state transition and reward function dynamics. Through this method, we are able to establish theoretical guarantees on improvements in computational efficiency compared to standard MDP solver (such as linear programming). We further derive error bounds on the optimal value approximation via Monte Carlo simulation for this family of stochastic control problems.
Liu et al. (Thu,) studied this question.