What type of study is this?

This is a Quantitative Study study.

synapse

⌘+K

synapse

⌘+K

October 20, 2025Open Access

Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm

Key Points

The algorithm attains global convergence with a constraint violation rate of $ ilde{ ext{O}}(1/ ext{T}^{0.5})$, improving constraint management.
Performance improves when the mixing time is known, achieving $ ilde{ ext{O}}(1/ ext{T}^{0.5- ext{ε}})$ under specific conditions.
Results align with the theoretical lower bound for MDPs, establishing a new benchmark in CMDPs studies.
The approach is grounded in average reward dynamics and applies sophisticated primal-dual techniques for optimization.

Abstract

This paper investigates infinite-horizon average reward Constrained Markov Decision Processes (CMDPs) with general parametrization. We propose a Primal-Dual Natural Actor-Critic algorithm that adeptly manages constraints while ensuring a high convergence rate. In particular, our algorithm achieves global convergence and constraint violation rates of O (1/T) over a horizon of length T when the mixing time, ₌₈ₗ, is known to the learner. In absence of knowledge of ₌₈ₗ, the achievable rates change to O (1/T^0. 5-) provided that T O (₌₈ₗ^2/). Our results match the theoretical lower bound for Markov Decision Processes and establish a new benchmark in the theoretical exploration of average reward CMDPs.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper