March 3, 2026Open Access

Integrating global optimum into learning-based energy management: a hybrid DRL-ECMS with behavioral cloning training and coordinated feedforward-feedback control

Key Points

Achieves a 77.56% improvement in initial reward compared to traditional DRL methods.
Implements a hierarchical framework comprising offline training and online operation for optimal performance.
Integrates behavior cloning for pre-training and adaptive policy entropy for fine-tuning the agent during training.
Demonstrates robust performance with a 10.75% reduction in battery degradation across diverse operating conditions.

Abstract

To overcome the limitation of poor training efficiency and real-time optimality in conventional Deep Reinforcement Learning-based Equivalent Consumption Minimization Strategy (DRL-ECMS) for ship hybrid power system, this paper proposes a novel hierarchical framework which integrates offline training and online operation. In the offline triple-layers training framework, first, the dual-state Pontryagin's Minimum Principle (DPMP) is implemented to derive the global optimum as expert knowledge; second, behavior cloning (BC) pre-trains the agent using these expert demonstrations for policy network initialization; third, the adaptive policy entropy (APE) mechanism tunes the policy during agent formal training process by dynamically adjusting the temperature coefficient. In the online operation framework, the feedforward-feedback coordinated control is proposed, comprising the imitation reinforcement learning (IRL) for equivalent factor (EF) feedforward pre-calibration, and the dual-state feedback (DSF) for EF correction. Results confirm the effectiveness of the proposed strategy, showing that compared to traditional DRL, the proposed method achieves a 77.56 % improvement in initial reward and a 51.78 % acceleration in convergence speed. Compared to the single-state feedback (SSF) mechanism, the proposed DSF method reduces the battery degradation rate by 10.75 %, increases the proportion of engine operation in the high-efficiency zone by 19.7 %, and ultimately achieves 96.31 % of the comprehensive performance of the global optimal benchmark. Furthermore, the robustness of the proposed strategy is validated through both cycle adaptability and state of charge (SOC) adaptability analysis. • Proposing a novel offline-online hierarchical framework for ship hybrid power systems. • The triple-layer training framework accelerates convergence speed by 51.78 %. • The online framework of feedforward-feedback coordinated control achieves near-optimum by 96.31 %. • Robust performance is demonstrated across nine diverse operating conditions.

Integrating global optimum into learning-based energy management: a hybrid DRL-ECMS with behavioral cloning training and coordinated feedforward-feedback control

Key Points

Abstract

Cite This Study