What question did this study set out to answer?

This work aims to introduce and evaluate UGTC, a novel advantage estimator designed for actor-critic reinforcement learning algorithms.

April 25, 2026Open Access

UGTC: Uncertainty-Gated Temporal Credit --- A Modular Advantage Estimator for Actor-Critic RL

Key Points

This work aims to introduce and evaluate UGTC, a novel advantage estimator designed for actor-critic reinforcement learning algorithms.
Developed UGTC that combines fast and slow critics through an uncertainty-based gating mechanism.
Integrated UGTC with various reinforcement learning architectures like PPO, TD3, SAC, and DreamerV3.
Evaluated UGTC across multiple benchmarks including MuJoCo and Procgen.
UGTC shows improved sample efficiency in multiple benchmarks.
Performance is competitive or superior when compared to strong baselines like REDQ.
Identified cases where UGTC is particularly beneficial, supported by a proposed prediction criterion.

Abstract

We introduce UGTC (Uncertainty-Gated Temporal Credit), a modular advantage estimator for actor-critic reinforcement learning. UGTC adaptively blends fast and slow critics using an uncertainty-based gating mechanism, enabling per-state bias–variance trade-off control. UGTC integrates seamlessly with PPO, TD3, SAC, and DreamerV3 through architecture-specific insertion points. Across multiple benchmarks, including MuJoCo, Procgen, MetaWorld ML45, and Crafter, UGTC demonstrates improved sample efficiency and competitive or superior performance compared to strong baselines such as REDQ and meta-gradient methods. We also provide analysis of failure cases and introduce a simple criterion for predicting when UGTC is beneficial. This is a preprint version of the work. The code and additional materials will be released separately.

UGTC: Uncertainty-Gated Temporal Credit --- A Modular Advantage Estimator for Actor-Critic RL

Key Points

Abstract

Cite This Study