This study proposes a continuous-action multi-agent reinforcement learning (MARL) controller for an integrated photovoltaic-thermal (PVT), air-to-water heat pump (AWHP), and stratified storage system. Three PPO agents regulate PVT, AWHP, and FCU flow rates at 60-s intervals under centralized training and decentralized execution. The controller optimizes tariff-weighted energy cost while ensuring comfort and constraint compliance, supported by uniform safety bounds and slew-rate limits. A year-long simulation of a reference office building in Busan compares PPO with a supervised DNN, a Dueling DQN agent, and rule-based control. PPO consistently yields smoother actions, preserves stratification, and reduces pumping—18-35% lower flow rates across subsystems—without comfort degradation. PVT electrical and thermal efficiencies remain stable, and AWHP operation avoids boundary saturation. Economically, PPO achieves the shortest payback (15.4 years) and the lowest 20-year life-cycle cost, outperforming all baselines. Results demonstrate that continuous-action MARL enables more efficient, storage-aware coordination than discrete RL or supervised methods.
Chae et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: