Deep reinforcement learning (RL) has been widely applied in power grid dispatching. Existing online RL methods rely on direct interaction with the environment, which introduces safety risks in real-world power systems. To address this issue, we propose GSC-QL (Gated State-Calibrated Q-Learning), a safe and efficient framework for offline-to-online policy transfer in smart grid control. Specifically, a State Evolution-Aware Encoder (SEAE) monitors Q-value quality and evaluates the similarity between current states and the offline data distribution. Based on this representation, a Triple-Gating Calibration Mechanism (TGCM) generates three adaptive control signals, including a forget gate that preserves reliable offline knowledge, an input gate that regulates the integration of online experiences, and a calibration gate that dynamically adjusts conservatism near safety boundaries. Unlike conventional gating mechanisms that operate directly on observations, TGCM performs gating on the latent Q-value knowledge state itself, enabling stable knowledge transfer. Without requiring explicit system models, GSC-QL is optimized under operational constraints and learns directly from historical and online interaction data. Experimental results demonstrate that GSC-QL consistently outperforms state-of-the-art offline and online RL baselines in operational cost reduction, constraint satisfaction, and robustness to distribution shift, while maintaining stable performance throughout the offline-to-online transition process.
Jianfei Wang (Tue,) studied this question.