What question did this study set out to answer?

To develop a safe framework for offline-to-online transfer in smart grid control using GSC-QL.

May 29, 2026Open Access

GSC-QL: Gated state-calibrated Q-learning for safe offline-to-online transfer in smart grid control

Puntos clave

To develop a safe framework for offline-to-online transfer in smart grid control using GSC-QL.
Developed a Gated State-Calibrated Q-Learning (GSC-QL) framework.
Implemented a State Evolution-Aware Encoder (SEAE) and a Triple-Gating Calibration Mechanism (TGCM).
Evaluated GSC-QL against offline and online reinforcement learning baselines through various experiments.
GSC-QL significantly reduced operational costs compared to state-of-the-art methods (exact metrics not specified).
Improved constraint satisfaction rates during the transition process between offline and online environments.
Showed increased robustness to distribution shifts, maintaining stable performance throughout.

Resumen

Deep reinforcement learning (RL) has been widely applied in power grid dispatching. Existing online RL methods rely on direct interaction with the environment, which introduces safety risks in real-world power systems. To address this issue, we propose GSC-QL (Gated State-Calibrated Q-Learning), a safe and efficient framework for offline-to-online policy transfer in smart grid control. Specifically, a State Evolution-Aware Encoder (SEAE) monitors Q-value quality and evaluates the similarity between current states and the offline data distribution. Based on this representation, a Triple-Gating Calibration Mechanism (TGCM) generates three adaptive control signals, including a forget gate that preserves reliable offline knowledge, an input gate that regulates the integration of online experiences, and a calibration gate that dynamically adjusts conservatism near safety boundaries. Unlike conventional gating mechanisms that operate directly on observations, TGCM performs gating on the latent Q-value knowledge state itself, enabling stable knowledge transfer. Without requiring explicit system models, GSC-QL is optimized under operational constraints and learns directly from historical and online interaction data. Experimental results demonstrate that GSC-QL consistently outperforms state-of-the-art offline and online RL baselines in operational cost reduction, constraint satisfaction, and robustness to distribution shift, while maintaining stable performance throughout the offline-to-online transition process.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Jianfei Wang (Tue,) studied this question.

synapsesocial.com/papers/6a192c8bfab5b468c44156ca https://doi.org/https://doi.org/10.1016/j.aej.2026.05.040

Me gusta

Guardar

Ver artículo completo