Reinforcement learning (RL) in partially observable and noisy environments remains a central challenge because of the difficulty in inferring latent structures, managing label uncertainty, and ensuring robust policy optimization. Reward Machines (RMs) offer a structured formalism for representing non-Markovian rewards through symbolic automata, but existing methods exhibit fragility under stochastic observations. To address this limitation, we introduce Probabilistic Induction with Genetic Local Search (PI-GLS), a unified framework that integrates probabilistic perception, inductive logic programming, and evolutionary optimization. PI-GLS achieves noise-robust RM induction by combining Bayesian inference for uncertain labels, sampling-based symbolic abstraction, and genetic local refinement of automaton structures. In addition, we propose a belief-aware reward shaping strategy that leverages distributions over RM states to guide policy learning under uncertainty. Extensive experiments on benchmark domains show that PI-GLS substantially improves convergence efficiency, robustness to sensor noise, and interpretability of learned reward models, achieving performance on par with manually engineered RMs even under severe noise. These results demonstrate the scalability and effectiveness of PI-GLS for autonomous decision-making in real-world scenarios where stochastic feedback and partial observability are unavoidable.
Zhu et al. (Wed,) studied this question.