What question did this study set out to answer?

The aim is to develop a control strategy using reinforcement learning for stability in PV-integrated power systems.

March 15, 2026Open Access

Value‐Based Reinforcement Learning for Inter‐Area Oscillations Mitigation in PV‐Integrated Power Systems

Key Points

The aim is to develop a control strategy using reinforcement learning for stability in PV-integrated power systems.
Developed a reinforcement learning-based control strategy for inter-area stability.
Utilized a value-based approach with RL state selection based on observability and controllability.
Implemented hyperparameter optimization using Taguchi orthogonal arrays for efficiency.
Achieved reduced overshoot by 55.34 and 22.58 compared to DDPG and SAC respectively.
Improved settling time by 6.79 and 12.6 for DDPG and SAC respectively.

Abstract

ABSTRACT The increasing integration of photovoltaic (PV) systems into interconnected power networks significantly impacts transient stability by reducing system inertia and amplifying low‐frequency inter‐area oscillations. This study proposes a reinforcement learning (RL)‐based control strategy to enhance the inter‐area stability in an interconnected power system with PV penetration, focusing on a tailored value‐based approach. The proposed method introduces a novel systematic RL state selection process based on participation factor and residue analysis to ensure observability and controllability of critical oscillatory modes. Hyperparameter optimization is carried out using Taguchi orthogonal arrays which enables efficient identification of optimal learning configurations with substantially fewer trials. Additionally, an action signal smoothing mechanism is introduced to produce continuous‐like control signals suitable for static VAR compensator (SVC) actuation, effectively bridging the discrete nature of the critic‐only agent with the continuous control requirements. The controller's performance is evaluated under severe fault scenarios and varying PV penetration levels. Comparative studies with continuous actor‐critic RL methods, namely deep deterministic policy gradient (DDPG) and soft actor‐critic (SAC), demonstrate that the proposed approach reduces overshoot by 55.34 and 22.58 relative to DDPG and SAC, and improves settling time by 6.79 and 12.6, respectively, underscoring its practical viability for dynamic power system stabilization.

Value‐Based Reinforcement Learning for Inter‐Area Oscillations Mitigation in PV‐Integrated Power Systems

Key Points

Abstract

Cite This Study