Abstract The quotient POMDP under bounded-agent indistinguishability is the unique minimal environment abstraction - a bounded-interaction Myhill–Nerode theorem for partially observable Markov decision processes. We develop this framework for finite POMDPs relative to agents modelled as finite-state controllers (FSCs) with m memory nodes and horizon T: a closed-loop pseudometric based on the 1-Wasserstein distance defines indistinguishability, and histories that no bounded agent can distinguish are merged into the quotient. We extend the framework to ε-approximate quotients with an interactive rate–distortion bound via directed in-formation, prove a data-processing monotonicity theorem with implications for hierarchical RL, and derive a compositional horizon-scaling theorem that bounds multi-layer error accumulation. The framework is tractable for the bounded-agent regime it targets: greedy selection reduces 6,405 FSCs to 5 with no partition loss, and sampling-based distance estimation scales to |S|=100 in sub-second time. Experiments on Tiger, GridWorld (up to 10×10, |S|=100), and random POMDPs validate monotonicity, show horizon scaling to T=10 via layered construction, demonstrate principal-controller compression for m=2 beyond T=4, and confirm that the pseudometric value bound is informative while the worst-case aggregation bound is vacuous at realistic scales. Notes Preprint manuscript.
Anthony T Nixon (Sat,) studied this question.