June 2, 2024Open Access

Policy Iteration for exploratory Hamilton--Jacobi--Bellman equations

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

We study the policy iteration algorithm (PIA) for entropy-regularized stochastic control problems on an infinite time horizon with a large discount rate, focusing on two main scenarios. First, we analyze PIA with bounded coefficients where the controls applied to the diffusion term satisfy a smallness condition. We demonstrate the convergence of PIA based on a uniform C^2, estimate for the value sequence generated by PIA, and provide a quantitative convergence analysis for this scenario. Second, we investigate PIA with unbounded coefficients but no control over the diffusion term. In this scenario, we first provide the well-posedness of the exploratory Hamilton--Jacobi--Bellman equation with linear growth coefficients and polynomial growth reward function. By such a well-posedess result we achieve PIA's convergence by establishing a quantitative locally uniform C^1, estimates for the generated value sequence.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Tran et al. (Sun,) studied this question.

synapsesocial.com/papers/68e669a3b6db6435875f5472 https://doi.org/https://doi.org/10.48550/arxiv.2406.00612

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo