What question did this study set out to answer?

The aim is to develop a dual-agent DRL framework for efficient low-carbon economic dispatch in microgrids.

January 25, 2026Open Access

Dual-Agent Deep Reinforcement Learning for Low-Carbon Economic Dispatch in Wind-Integrated Microgrids Based on Carbon Emission Flow

Key Points

The aim is to develop a dual-agent DRL framework for efficient low-carbon economic dispatch in microgrids.
Implemented a dual-agent DRL framework using PPO and SAC agents.
Utilized carbon emission flow theory for network-level carbon tracing.
Formulated the dispatch problem as a Markov Decision Process.
Incorporated demand response for increased operational flexibility.
Conducted case studies on a modified PJM 5-bus test system.
Achieved a 16.8% reduction in total operating costs compared to DDPG baseline.
Reduced carbon emissions by 11.3%.
Lessened wind curtailment by 15.2%.
Demonstrated effectiveness of the dual-agent approach in renewable-rich systems.

Abstract

High renewable penetration in microgrids makes low-carbon economic dispatch under uncertainty challenging, and single-agent deep reinforcement learning (DRL) often yields unstable cost–emission trade-offs. This study proposes a dual-agent DRL framework that explicitly balances operational economy and environmental sustainability. A Proximal Policy Optimization (PPO) agent focuses on minimizing operating cost, while a Soft Actor–Critic (SAC) agent targets carbon emission reduction; their actions are combined through an adaptive weighting strategy. The framework is supported by carbon emission flow (CEF) theory, which enables network-level tracing of carbon flows, and a stepped carbon pricing mechanism that internalizes dynamic carbon costs. Demand response (DR) is incorporated to enhance operational flexibility. The dispatch problem is formulated as a Markov Decision Process, allowing the dual-agent system to learn policies through interaction with the environment. Case studies on a modified PJM 5-bus test system show that, compared with a Deep Deterministic Policy Gradient (DDPG) baseline, the proposed method reduces total operating cost, carbon emissions, and wind curtailment by 16.8%, 11.3%, and 15.2%, respectively. These results demonstrate that the proposed framework is an effective solution for economical and low-carbon operation in renewable-rich power systems.

Dual-Agent Deep Reinforcement Learning for Low-Carbon Economic Dispatch in Wind-Integrated Microgrids Based on Carbon Emission Flow

Key Points

Abstract

Cite This Study