We introduce UGTC (Uncertainty-Gated Temporal Credit), a modular advantage estimator for actor-critic reinforcement learning. UGTC adaptively blends fast and slow critics using an uncertainty-based gating mechanism, enabling per-state bias–variance trade-off control. UGTC integrates seamlessly with PPO, TD3, SAC, and DreamerV3 through architecture-specific insertion points. Across multiple benchmarks, including MuJoCo, Procgen, MetaWorld ML45, and Crafter, UGTC demonstrates improved sample efficiency and competitive or superior performance compared to strong baselines such as REDQ and meta-gradient methods. We also provide analysis of failure cases and introduce a simple criterion for predicting when UGTC is beneficial. This is a preprint version of the work. The code and additional materials will be released separately.
Dalar et al. (Thu,) studied this question.