What question did this study set out to answer?

The aim is to improve feature extraction and scheduling efficiency in power systems using deep reinforcement learning and cloud-edge collaboration.

February 5, 2026

Adaptive extraction of multi-scale power data and cloud-edge collaborative scheduling based on deep reinforcement learning

Key Points

The aim is to improve feature extraction and scheduling efficiency in power systems using deep reinforcement learning and cloud-edge collaboration.
Divided timescales and spatial scales for data processing.
Employed a multi-layer convolutional feature encoding network for feature extraction.
Utilized a graph neural network to model grid topology and a policy gradient algorithm for policy updates.
Implemented a hierarchical parameter synchronization mechanism between cloud and edge devices.
Achieved a 77.78% data compression rate, reducing signal dimensionality significantly.
Reduced average latency from 35 ms to 12 ms and memory usage from 48 MB to 15 MB.
Maintained a robustness index above 0.82 under varying load disturbances.

Abstract

Existing power systems are missing efficient feature screening and global real-time collaboration in their multi-scale data processing and dispatching architectures, hindering their safe and stable operation in dynamic environments. This paper addresses adaptive extraction of multi-scale power data and cloud-edge collaborative scheduling based on deep reinforcement learning. We make the explicit division of timescale into milliseconds to seconds, seconds to minutes, and minutes to hours, and spatial scale into measurement point, feeder, and zone levels, such that adaptive extraction can find the necessary trade-off between preserving discriminative information and reducing transmission costs under limited communication bandwidth and edge computing power. Spatiotemporal coupling features are extracted from the original voltage, current, load, and device state sequences using a multi-layer convolutional feature encoding network, and a multi-head attention-based feature screening module dynamically assigns weights to the encoded vector to attend to key state variables. Lightweight policy network optimized with parameter pruning and sparsification is deployed on the edge for low-latency local state assessment and action execution, and the cloud is responsible for network-wide topology modeling and global policy optimization. Grid topology is modeled using a graph neural network to preserve topological invariance, and node coupling relationships are represented via neighborhood message passing. The policy gradient algorithm is used to update policies in continuous high-dimensional action spaces, and the update variance is reduced through value estimation and advantage normalization. A hierarchical parameter synchronization mechanism is used between the cloud and the edge to exchange compressed feature summaries, parameters, and gating thresholds at periodic or event-driven synchronization points, preserving policy convergence and state consistency. The decision flow is a cooperative, closed-loop of short-term actions at the edge and global instructions in the cloud. In the constructed deep reinforcement learning framework, the state includes filtered feature summary, local latency measurements, and cloud parameter vector, and the actions include both discrete feature selection gates and continuous scheduling instructions. The reward is a weighted sum of scheduling deviation, feature reconstruction error, latency, and resource consumption. The weights are calibrated on the validation set to preserve joint optimization of feature extraction and scheduling decisions. The dimensionality of voltage and current signals is reduced from 21 600 to 4800 during feature compression, resulting in a 77.78% compression rate and significantly reducing data transmission pressure. During the edge inference phase, the average latency is reduced from 35 to 12 ms, and memory usage is reduced from 48 to 15 MB, which demonstrates its high efficiency under limited computing power. When the load disturbance increases from 0.5% to 10%, the robustness index remains above 0.82, demonstrating its adaptability to complex operating conditions. This method applies to monitoring and scheduling tasks with the same sampling rate and edge device resources as the experimental platform described in this paper. The process does not guarantee meeting the strict, challenging real-time cutoff requirements when the link latency exceeds 200 ms or the available memory on the edge device is less than 250 MB.

Mark Helpful

Bookmark

Relay