What does this research mean for the field?

The quotient POMDP under bounded-agent indistinguishability provides a unique minimal environment abstraction for finite partially observable Markov decision processes. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to establish a minimal environment abstraction for finite partially observable Markov decision processes using bounded-agent indistinguishability.

March 8, 2026Open Access

Finite-POMDP Abstraction via Agent-Bounded Indistinguishability: A Bounded-Interaction Myhill–Nerode Theorem

Key Points

The aim is to establish a minimal environment abstraction for finite partially observable Markov decision processes using bounded-agent indistinguishability.
Developed a closed-loop pseudometric based on the 1-Wasserstein distance to define indistinguishability.
Extended the framework to ε-approximate quotients with interactive rate-distortion bounds.
Proved a data-processing monotonicity theorem and a compositional horizon-scaling theorem.
Conducted experiments on various environments such as Tiger and GridWorld.
Implemented greedy selection to reduce FSCs with no partition loss.
Greedy selection effectively reduced 6,405 FSCs to just 5 without partition loss.
Pseudometric value bounds were found to be informative, while worst-case bounds were not useful at realistic scales.
Experiments validated the monotonicity and demonstrated horizon scaling to T=10 through layered construction.
Principal-controller compression was observed for conditions beyond T=4.

Abstract

Abstract The quotient POMDP under bounded-agent indistinguishability is the unique minimal environment abstraction - a bounded-interaction Myhill–Nerode theorem for partially observable Markov decision processes. We develop this framework for finite POMDPs relative to agents modelled as finite-state controllers (FSCs) with m memory nodes and horizon T: a closed-loop pseudometric based on the 1-Wasserstein distance defines indistinguishability, and histories that no bounded agent can distinguish are merged into the quotient. We extend the framework to ε-approximate quotients with an interactive rate–distortion bound via directed in-formation, prove a data-processing monotonicity theorem with implications for hierarchical RL, and derive a compositional horizon-scaling theorem that bounds multi-layer error accumulation. The framework is tractable for the bounded-agent regime it targets: greedy selection reduces 6,405 FSCs to 5 with no partition loss, and sampling-based distance estimation scales to |S|=100 in sub-second time. Experiments on Tiger, GridWorld (up to 10×10, |S|=100), and random POMDPs validate monotonicity, show horizon scaling to T=10 via layered construction, demonstrate principal-controller compression for m=2 beyond T=4, and confirm that the pseudometric value bound is informative while the worst-case aggregation bound is vacuous at realistic scales. Notes Preprint manuscript.

Bookmark

View Full Paper

Bookmark

View Full Paper

Finite-POMDP Abstraction via Agent-Bounded Indistinguishability: A Bounded-Interaction Myhill–Nerode Theorem

Key Points

Abstract

Cite This Study