What question did this study set out to answer?

The aim is to improve reinforcement learning by reducing resource wastage and biases in state-value estimation.

January 22, 2026

MSSEAC: Multi‐State Soft Elastic Actor‐Critic

Key Points

The aim is to improve reinforcement learning by reducing resource wastage and biases in state-value estimation.
Introduced a temporal consumption penalty mechanism for control frequency adjustment.
Developed a dual-branch output structure in the actor network to generate actions and time estimates.
Created a multi-state temporal difference framework for better state-value learning.
Implemented an innovative experience replay buffer management strategy.
Improved policy performance through autonomous control frequency adjustment.
Reduced estimation bias by fusing return distributions from multiple future states.
Stabilized initial training phases with historical actions.

Abstract

ABSTRACT In reinforcement learning (RL), the assumption of fixed control frequency often leads to computational resource wastage and degraded policy performance, while traditional single‐step temporal difference (TD) learning suffers from accumulated state‐value estimation bias. This paper proposes the multi‐state soft elastic actor‐critic (MSSEAC) algorithm to address these issues: First, the paper introduces a temporal consumption penalty mechanism and reconstructs the actor network's dual‐branch output structure to simultaneously generate control actions and time consumption estimates, enabling autonomous control frequency adjustment. Second, the multi‐state temporal difference (MSTD) framework is developed to address the limitations of conventional single‐step TD learning. Specifically, an innovative experience replay buffer management strategy is proposed, where historical actions are utilized to stabilize the learning process during initial training phases, with a gradual transition to policy‐generated actions in later stages to enhance estimation accuracy. The multi‐state‐value estimation effectively mitigates the bias accumulation problem inherent in single‐step TD methods through weighted fusion of return distributions from multiple future states. Code is available at: https://github.com/asdwqqqq/MSSEAC.git .

Bookmark

MSSEAC: Multi‐State Soft Elastic Actor‐Critic

Key Points

Abstract

Cite This Study

Also Consider

Also Consider