Efficient exploration remains a challenge in reinforcement learning (RL), especially in stochastic or complex environments. We introduce Predictive Improvement through Latent space OpTimisation (PILOT), an intrinsically motivated RL algorithm that rewards actions leading to improvements in the agent’s environmental dynamics model. PILOT optimizes an intrinsic reward signal based on epistemic uncertainty reduction, thereby encouraging structured exploration. Our evaluations against benchmark intrinsic motivation algorithms in challenging environments show that PILOT achieves superior performance and exhibits robustness to stochastic distractions.
McCaffrey et al. (Thu,) studied this question.