What question did this study set out to answer?

The aim is to develop a new RL algorithm that promotes better exploration using intrinsic rewards.

March 31, 2026

Predictive Improvement through Latent Space Optimisation

Puntos clave

The aim is to develop a new RL algorithm that promotes better exploration using intrinsic rewards.
Introduced PILOT, an intrinsically motivated algorithm.
Optimized an intrinsic reward signal based on reducing epistemic uncertainty.
Evaluated against benchmark algorithms in difficult environments.
PILOT achieved superior performance compared to benchmark intrinsic motivation algorithms.
Demonstrated robustness to distractions in stochastic settings.

Resumen

Efficient exploration remains a challenge in reinforcement learning (RL), especially in stochastic or complex environments. We introduce Predictive Improvement through Latent space OpTimisation (PILOT), an intrinsically motivated RL algorithm that rewards actions leading to improvements in the agent’s environmental dynamics model. PILOT optimizes an intrinsic reward signal based on epistemic uncertainty reduction, thereby encouraging structured exploration. Our evaluations against benchmark intrinsic motivation algorithms in challenging environments show that PILOT achieves superior performance and exhibits robustness to stochastic distractions.

Me gusta

Guardar

Me gusta

Guardar

Predictive Improvement through Latent Space Optimisation

Puntos clave

Resumen

Cite This Study