November 30, 2025

On the Convergence of Modified Policy Iteration in Risk-Sensitive Exponential Cost Markov Decision Processes

Key Points

Convergence of algorithms in risk-sensitive settings increases robustness in decision making.
Key evidence shows effectiveness through analysis of modified policy iteration and the multiplicative Bellman equation.
Dynamic programming approaches, such as policy iteration, were examined for risk-sensitive objectives in this exploration.
Research indicates a promising framework for reinforcement learning algorithms focused on reliability and computational efficiency.

Abstract

Balancing Risk and Robustness in Dynamic Decision Making Many real systems, such as networks, finance, and safety-critical autonomy, must hedge against rare but costly events. Risk-sensitive control formalizes this idea by optimizing an exponential cost objective that prioritizes reliability over just average performance. Classical dynamic programming methods such as value iteration and policy iteration are well-understood in this risk-sensitive setting. However, modified policy iteration (MPI), which combines the strengths of both through partial policy evaluation, has lacked any theoretical understanding. This paper addresses this gap. It analyzes MPI for risk-sensitive Markov decision processes governed by a multiplicative Bellman equation, develops normalization and contraction tools suited to this setting, and proves both convergence and finite-time guarantees. The results provide a principled foundation for algorithms that combine computational efficiency with robustness, supporting the development of reinforcement learning methods that emphasize long-term reliability.

Mark Helpful

Bookmark

Relay