What question did this study set out to answer?

The aim is to develop a dual-agent DRL framework for efficient low-carbon economic dispatch in microgrids.

January 25, 2026Open Access

Dual-Agent Deep Reinforcement Learning for Low-Carbon Economic Dispatch in Wind-Integrated Microgrids Based on Carbon Emission Flow

Key Points

The aim is to develop a dual-agent DRL framework for efficient low-carbon economic dispatch in microgrids.
Implemented a dual-agent DRL framework using PPO and SAC agents.
Utilized carbon emission flow theory for network-level carbon tracing.
Formulated the dispatch problem as a Markov Decision Process.
Incorporated demand response for increased operational flexibility.
Conducted case studies on a modified PJM 5-bus test system.
Achieved a 16.8% reduction in total operating costs compared to DDPG baseline.
Reduced carbon emissions by 11.3%.
Lessened wind curtailment by 15.2%.
Demonstrated effectiveness of the dual-agent approach in renewable-rich systems.

Abstract

High renewable penetration in microgrids makes low-carbon economic dispatch under uncertainty challenging, and single-agent deep reinforcement learning (DRL) often yields unstable cost–emission trade-offs. This study proposes a dual-agent DRL framework that explicitly balances operational economy and environmental sustainability. A Proximal Policy Optimization (PPO) agent focuses on minimizing operating cost, while a Soft Actor–Critic (SAC) agent targets carbon emission reduction; their actions are combined through an adaptive weighting strategy. The framework is supported by carbon emission flow (CEF) theory, which enables network-level tracing of carbon flows, and a stepped carbon pricing mechanism that internalizes dynamic carbon costs. Demand response (DR) is incorporated to enhance operational flexibility. The dispatch problem is formulated as a Markov Decision Process, allowing the dual-agent system to learn policies through interaction with the environment. Case studies on a modified PJM 5-bus test system show that, compared with a Deep Deterministic Policy Gradient (DDPG) baseline, the proposed method reduces total operating cost, carbon emissions, and wind curtailment by 16.8%, 11.3%, and 15.2%, respectively. These results demonstrate that the proposed framework is an effective solution for economical and low-carbon operation in renewable-rich power systems.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Wenjun Qiu

Nanjing University of Information Science and Technology

Hebin Ruan

Xiaoxiao Yu

Australian Regenerative Medicine Institute

Journals

Energies

Actions

Institutions

Nanjing University of Science and Technology

Nanjing University of Information Science and Technology

Global Energy Interconnection Research Institute North America

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Dual-Agent Deep Reinforcement Learning for Low-Carbon Economic Dispatch in Wind-Integrated Microgrids Based on Carbon Emission Flow

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study