Abstract We study countably infinite Markov decision processes with Büchi objectives, which ask to visit a given subset of states infinitely often. A question left open by T. P. Hill (1979) is whether there always exist ε -optimal Markov strategies, i. e. , strategies that base decisions only on the current state and on the clock (the number of steps taken so far). We provide a negative answer to this question by constructing a non-trivial counterexample.
Kiefer et al. (Sat,) studied this question.