What question did this study set out to answer?

This research aims to systematically characterize the exploitation gap in DreamerV3 and identify indicators of exploitation onset.

June 21, 2026Open Access

When the World Model Lies: Measuring and Characterising Reward Exploitation in DreamerV3 under Sparse Feedback

Key Points

This research aims to systematically characterize the exploitation gap in DreamerV3 and identify indicators of exploitation onset.
Characterized the exploitation gap in DreamerV3 on AntMaze with four new metrics.
Measured the imagined-to-real reward ratio at approximately 50x over 500k environment steps.
Introduced KL divergence metrics to predict exploitation onset with a lag of approximately 50k steps.
Imagined-to-real reward ratio reached approximately 50x, while evaluation return stayed below 0.05.
KL divergence collapse indicated exploitation onset with high correlation (r = -0.91, p < 0.001).
Sparse context-kernel gating reduced the exploitation gap but did not eliminate it.

Abstract

Model-based reinforcement learning agents that plan entirely in imagination can achieve high imagined returns while completely failing the actual task — a failure mode we term the exploitation gap. We provide the first systematic characterisation of this gap in DreamerV3 on AntMaze, where the world model receives near-zero reward from real experience. Instrumenting the training loop with four new metrics, we show that the imagined-to-real reward ratio reaches approximately 50x at 500k environment steps while evaluation return stays below 0.05. We establish that KL divergence collapse is a leading indicator of exploitation onset with a approximately 50k step lag (r = -0.91, p < 0.001), providing an actionable early-warning signal. Comparing to the hierarchical baseline THICK, we show that sparse context-kernel gating reduces but does not eliminate the gap. A dense-reward ablation confirms that rich reward signal suppresses exploitation entirely. We propose three KL-aware mitigation strategies and release all experimental infrastructure for reproducibility.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper