What question did this study set out to answer?

To develop a multi-agent reinforcement learning framework that integrates various factors affecting residential energy systems under uncertainty.

May 9, 2026Open Access

Robust multi-agent reinforcement learning framework for intelligent PV-integrated smart energy systems under uncertainty

Key Points

To develop a multi-agent reinforcement learning framework that integrates various factors affecting residential energy systems under uncertainty.
Developed a Markov game model where each prosumer acts as an autonomous agent with PV generation, battery storage, and flexible demand.
Incorporated economic costs, asset health, and comfort preservation into a unified learning objective.
Utilized simulations to evaluate performance against centralized benchmarks under varying uncertainty levels.
Achieved performance competitive with centralized benchmarks while maintaining consistent results.
Reduced asset degradation throughout the operation of the energy system.
Successfully preserved user comfort during energy transactions.

Abstract

The increasing penetration of residential photovoltaics (PV), energy storage, and flexible demand introduces significant uncertainty, coordination challenges, and long-term asset degradation in smart energy communities. Existing residential energy management approaches often rely on deterministic optimization or single-agent learning, limiting robustness, scalability, and the ability to balance economic performance, asset health, and user comfort under stochastic operating conditions. This paper proposes a unified, practically oriented integration of uncertainty, asset degradation, comfort constraints, and peer-to-peer (P2P) energy exchange within a multi-agent reinforcement learning (MARL) framework for residential energy communities. The community is formulated as a Markov game in which each prosumer operates as an autonomous agent with PV generation, battery storage, and flexible demand. Economic cost, comfort preservation, and asset degradation are incorporated into a single learning objective. This enables decentralized and coordinated decision-making through shared interactions with the environment. Simulation results under varying levels of uncertainty and community sizes demonstrate that the proposed framework achieves performance competitive with a centralized benchmark while exhibiting consistent performance, reduced asset degradation, and effective comfort preservation.

AI에게 질문

Bookmark

View Full Paper