Hybrid energy storage systems (HESS), combining batteries and supercapacitors, offer an effective solution for simultaneously addressing energy and power demands in modern energy systems. However, the performance and lifetime of these systems are strongly dependent on the employed energy management strategy, which must balance fast power regulation, safety enforcement, and long-term battery health. Existing reinforcement learning-based approaches often adopt flat control architectures that entangle short-term decision-making with long-term aging effects, leading to training instability, limited interpretability, and non‑stationary reward structures. This paper proposes a hierarchical multi‑timescale reinforcement learning framework for energy management of battery-supercapacitor hybrid systems with explicit aging‑aware constraint adaptation. The proposed architecture decomposes the control problem into three coordinated layers: a fast safety‑critical layer that enforces hard operational constraints, an intermediate reinforcement learning layer based on Proximal Policy Optimization (PPO) for real‑time power allocation, and a slow supervisory layer that estimates battery state‑of‑health (SOH) using a Gaussian Process Regression (GPR) aging model. Instead of directly embedding aging costs into the learning objective, battery degradation is incorporated through SOH‑dependent adaptive current limits, ensuring quasi‑stationary learning dynamics and safe long‑term operation. Comprehensive simulation studies are conducted under nominal, progressive aging, and safety‑critical operating conditions. Under nominal SOH, the proposed framework maintains battery and supercapacitor SOC deviations within ± 3% of their reference values while keeping battery current below the nominal 125 A limit. During aging progression, the method achieves a lower time‑averaged squared battery current and reduced cumulative energy throughput compared to reward‑based SAC and flat RL baselines, resulting in a measurably slower SOH decline for equivalent stress levels. Moreover, under aggressive pulsed loads at reduced SOH, the proposed approach yields the lowest constraint activation frequency, demonstrating superior intrinsic safety and robustness. Overall, the results confirm that hierarchical time‑scale separation enables stable, aging‑aware, and physically consistent energy management without compromising learning performance.
Mohammadbeigi et al. (Thu,) studied this question.