What question did this study set out to answer?

This research aims to develop a reinforcement learning controller for optimizing energy use in integrated thermal systems.

June 17, 2026

Study on Reinforcement Learning-Based Control for PVT-Heat Pump Integrated Thermal Storage

Key Points

This research aims to develop a reinforcement learning controller for optimizing energy use in integrated thermal systems.
Utilized continuous-action multi-agent reinforcement learning (MARL) with PPO agents for control.
Executed a year-long simulation of systems in a reference office building in Busan.
Compared performance against supervised DNN, Dueling DQN, and rule-based control systems.
PPO showed 18-35% lower flow rates without comfort degradation compared to baseline methods.
Achieved the shortest payback period of 15.4 years and the lowest 20-year life-cycle cost.
Consistently provided smoother actions and maintained thermal efficiency in the systems.

Abstract

This study proposes a continuous-action multi-agent reinforcement learning (MARL) controller for an integrated photovoltaic-thermal (PVT), air-to-water heat pump (AWHP), and stratified storage system. Three PPO agents regulate PVT, AWHP, and FCU flow rates at 60-s intervals under centralized training and decentralized execution. The controller optimizes tariff-weighted energy cost while ensuring comfort and constraint compliance, supported by uniform safety bounds and slew-rate limits. A year-long simulation of a reference office building in Busan compares PPO with a supervised DNN, a Dueling DQN agent, and rule-based control. PPO consistently yields smoother actions, preserves stratification, and reduces pumping—18-35% lower flow rates across subsystems—without comfort degradation. PVT electrical and thermal efficiencies remain stable, and AWHP operation avoids boundary saturation. Economically, PPO achieves the shortest payback (15.4 years) and the lowest 20-year life-cycle cost, outperforming all baselines. Results demonstrate that continuous-action MARL enables more efficient, storage-aware coordination than discrete RL or supervised methods.

اسأل الذكاء الاصطناعي

Bookmark