What type of study is this?

This is a Quantitative Study study.

September 20, 2025

Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs

Key Points

Robust policies achieved strong performance across multiple hidden-model POMDPs, ensuring reliability despite environmental uncertainty.
The evaluation demonstrated that the proposed method outperforms various baselines in both robustness and generalization across unseen POMDPs.
By utilizing formal verification and subgradient ascent techniques, the approach effectively computes robust policies under numerous simulated environments.
The findings suggest that such robust policy techniques can scale effectively, handling hidden-model POMDPs with over a hundred thousand varying environments.

Abstract

Partially observable Markov decision processes (POMDPs) model specific environments in sequential decision-making under uncertainty. Critically, optimal policies for POMDPs may not be robust against perturbations in the environment. Hidden-model POMDPs (HM-POMDPs) capture sets of different environment models, that is, POMDPs with a shared action and observation space. The intuition is that the true model is hidden among a set of potential models, and it is unknown which model will be the environment at execution time. A policy is robust for a given HM-POMDP if it achieves sufficient performance for each of its POMDPs. We compute such robust policies by combining two orthogonal techniques: (1) a deductive formal verification technique that supports tractable robust policy evaluation by computing a worst-case POMDP within the HM-POMDP, and (2) subgradient ascent to optimize the candidate policy for a worst-case POMDP. The empirical evaluation shows that, compared to various baselines, our approach (1) produces policies that are more robust and generalize better to unseen POMDPs, and (2) scales to HM-POMDPs that consist of over a hundred thousand environments.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Maris F. L. Galesloot

Roman Andriushchenko

Brno University of Technology

Milan Češka

Brno University of Technology

Actions

Institutions

Radboud University Nijmegen

Ruhr University Bochum

Brno University of Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study