Model-based reinforcement learning agents that plan entirely in imagination can achieve high imagined returns while completely failing the actual task — a failure mode we term the exploitation gap. We provide the first systematic characterisation of this gap in DreamerV3 on AntMaze, where the world model receives near-zero reward from real experience. Instrumenting the training loop with four new metrics, we show that the imagined-to-real reward ratio reaches approximately 50x at 500k environment steps while evaluation return stays below 0.05. We establish that KL divergence collapse is a leading indicator of exploitation onset with a approximately 50k step lag (r = -0.91, p < 0.001), providing an actionable early-warning signal. Comparing to the hierarchical baseline THICK, we show that sparse context-kernel gating reduces but does not eliminate the gap. A dense-reward ablation confirms that rich reward signal suppresses exploitation entirely. We propose three KL-aware mitigation strategies and release all experimental infrastructure for reproducibility.
Khassanov et al. (Thu,) studied this question.