This paper studies a black-box methodology for optimizing ergodic stochastic systems, focusing on the construction of scalar measures that reliably indicate progress toward optimality. Our starting point is a state-value quantity that inherently exhibits oscillatory behavior and does not converge under standard conditions. We show that, despite its fluctuations, this quantity admits a recursive representation derived from a one-step-ahead fixed-local-optimal policy. The approach relies on identifying a Lyapunov-like function whose evolution reflects the long-run behavior of the system without requiring explicit knowledge of its internal dynamics. Such a function provides a monotonic indicator—non-increasing over time—that remains valid for any initial probability distribution. Whenever an optimal trajectory of the Markov chain exists, the proposed method guarantees convergence to it. We also provide a constructive procedure for obtaining the Lyapunov-like function and validate the methodology through theoretical analysis and numerical simulations.
Julio B. Clempner (Thu,) studied this question.