What does this research mean for the field?

Average optimal policies for sa-rectangular robust Markov decision processes can be stationary and deterministic, but may not exist for s-rectangular RMDPs, and approximately Blackwell optimal policies always exist for sa-rectangular RMDPs. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This research aims to investigate robust Markov decision processes (RMDPs) for average and Blackwell optimality criteria beyond discounted returns.

March 6, 2026

Beyond Discounted Returns: Robust Markov Decision Processes with Average and Blackwell Optimality

Key Points

This research aims to investigate robust Markov decision processes (RMDPs) for average and Blackwell optimality criteria beyond discounted returns.
Analyzed average optimal policies in sa-rectangular and s-rectangular RMDPs.
Examined the existence and nature of stationary policies.
Studied Blackwell optimality and provided conditions for existence.
Developed algorithms to compute optimal average returns.
Explored connections between RMDPs and stochastic games.
Stationary and deterministic average optimal policies exist for sa-rectangular RMDPs.
Average optimal policies may not exist for s-rectangular RMDPs.
Approximately Blackwell optimal policies always exist for sa-rectangular RMDPs.
Provided a sufficient condition for the existence of Blackwell optimal policies.
Emphasized the advantages of distance-based sa-rectangular models over s-rectangular models.

Abstract

Novel Insights on Robust Markov decision Processes with Average Reward and Blackwell Optimality Criteria Robust Markov decision processes (RMDPs) have been studied extensively when the objective is the discounted return, but little is known for average optimality and Blackwell optimality. We show that average optimal policies can be chosen stationary and deterministic for sa-rectangular RMDPs, but perhaps surprisingly, we show that for s-rectangular RMDPs average optimal policies may not exist, and if they do exist, they may not be stationary. We also study Blackwell optimality for sa-rectangular RMDPs, where we show that approximately Blackwell optimal policies always exist, although exact Blackwell optimal policies may not exist. We provide a general sufficient condition for their existence. We then discuss the connection between average and Blackwell optimality, and we describe several algorithms to compute the optimal average return. Interestingly, our approach leverages the connections between RMDPs and stochastic games. Overall, our paper emphasizes the superior practical properties of distance-based sa-rectangular models over s-rectangular models for average and Blackwell optimality.

Bookmark

Cite This Study

Petrik et al. (Wed,) studied this question.

synapsesocial.com/papers/69aa70e7531e4c4a9ff5b116 https://doi.org/https://doi.org/10.1287/opre.2023.0694

Bookmark