Key points are not available for this paper at this time.
.We develop a regression-based primal-dual martingale approach for solving discrete time, finite-horizon MDPs. The state and action spaces may be finite or infinite (but regular enough) subsets of Euclidean space. Consequently, our method allows for the construction of tight upper and lower-biased approximations of the value functions, providing precise estimates of the optimal policy. Importantly, we prove error bounds for the estimated duality gap featuring polynomial dependence on the time horizon. Additionally, we observe sublinear dependence of the stochastic part of the error on the cardinality/dimension of the state and action spaces. From a computational perspective, our proposed method is efficient. Unlike typical duality-based methods for optimal control problems in the literature, the Monte Carlo procedures involved here do not require nested simulations.KeywordsMarkov decision processesreinforcement learningdual representationpseudo regressionStein control functionalsMSC codes90C4065C0562G08
Belomestny et al. (Mon,) studied this question.