What question did this study set out to answer?

To evaluate advanced deep reinforcement learning algorithms for controlling residential heat pumps while minimizing energy costs and thermal discomfort.

May 9, 2026Open Access

Advanced Deep Reinforcement Learning for Heat Pump Control in Residential Buildings

Key Points

To evaluate advanced deep reinforcement learning algorithms for controlling residential heat pumps while minimizing energy costs and thermal discomfort.
Employed eight control strategies including rule-based controllers and deep reinforcement learning algorithms.
Utilized the LLECBuildingGym for modeling and testing in a 1R1C thermal building model.
Conducted an extensive ablation study to identify optimal RL configurations.
Advanced RL algorithms achieved a 6% cost reduction compared to rule-based controllers.
Outperformed model-predictive controllers by 1% but underperformed those with perfect prediction by less than 4%.

Abstract

Residential heating is a significant contributor to carbon emissions. Replacing conventional on/off and heating curve controls with smart strategies is essential for decarbonization. This paper presents eight state-of-the-art control strategies for residential air-source heat pumps in the open-source environment LLECBuildingGym, which emulates the heat pump house at the Living Lab Energy Campus (LLEC). We compare three rule-based controllers (fuzzy, PI, and PID), a model-predictive controller (MPC), and four advanced deep reinforcement learning (RL) algorithms (A2C, DDPG, PPO, and SAC) in a 1R1C thermal building model with continuous heating and cooling control. The model captures nonlinear thermal dynamics using Euler discretization, models sensor uncertainties as reflected Wiener processes and integrates dynamic electricity tariffs. We define single-objective (temperature) and multi-objective tasks that minimize thermal discomfort and energy costs. An extensive ablation study identifies the best performing RL algorithm configuration that reduces cost by 6% compared to rule-based controllers, outperforms MPC by 1% and underperforms MPC with perfect prediction by less than 4%.

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper