What does this research mean for the field?

The Probabilistic Induction with Genetic Local Search (PI-GLS) framework substantially improves convergence efficiency, robustness to sensor noise, and interpretability of learned reward models in reinforcement learning, achieving performance comparable to manually engineered Reward Machines. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The research aims to enhance reward machine induction in noisy environments using a novel framework.

May 30, 2026Open Access

Noise-robust reward machine induction via probabilistic modeling and genetic local search

Key Points

The research aims to enhance reward machine induction in noisy environments using a novel framework.
Introduced a framework named Probabilistic Induction with Genetic Local Search (PI-GLS) that integrates various modeling techniques.
Employed Bayesian inference to manage label uncertainty and a genetic local search for optimizing automaton structures.
Implemented belief-aware reward shaping to improve policy learning under uncertainty.
PI-GLS significantly improved convergence efficiency and robustness to sensor noise.
Achieved interpretability of learned reward models comparable to manually engineered RMs even under severe noise.
Showed scalability and effectiveness in real-world decision-making scenarios affected by stochastic feedback.

Abstract

Reinforcement learning (RL) in partially observable and noisy environments remains a central challenge because of the difficulty in inferring latent structures, managing label uncertainty, and ensuring robust policy optimization. Reward Machines (RMs) offer a structured formalism for representing non-Markovian rewards through symbolic automata, but existing methods exhibit fragility under stochastic observations. To address this limitation, we introduce Probabilistic Induction with Genetic Local Search (PI-GLS), a unified framework that integrates probabilistic perception, inductive logic programming, and evolutionary optimization. PI-GLS achieves noise-robust RM induction by combining Bayesian inference for uncertain labels, sampling-based symbolic abstraction, and genetic local refinement of automaton structures. In addition, we propose a belief-aware reward shaping strategy that leverages distributions over RM states to guide policy learning under uncertainty. Extensive experiments on benchmark domains show that PI-GLS substantially improves convergence efficiency, robustness to sensor noise, and interpretability of learned reward models, achieving performance on par with manually engineered RMs even under severe noise. These results demonstrate the scalability and effectiveness of PI-GLS for autonomous decision-making in real-world scenarios where stochastic feedback and partial observability are unavoidable.

Mark Helpful

Bookmark

Relay

View Full Paper