Deep reinforcement learning (DRL)-based HVAC control has shown clear advantages over rule-based and model predictive methods. However, most prior studies remain limited to HVAC-only optimization or simple coordination with operable windows. Such approaches do not adequately address buildings with fixed glazing systems—a common feature in high-rise offices—where the lack of operable windows restricts adaptive envelope interaction. To address this gap, this study proposes a multi-zone control framework that integrates HVAC systems with electrochromic windows (ECWs). The framework leverages the Q-value Mixing (QMIX) algorithm to dynamically coordinate ECW transmittance with HVAC setpoints, aiming to enhance energy efficiency and thermal comfort, particularly in high-consumption buildings such as offices. Its performance is evaluated using EnergyPlus simulations. The results show that the proposed approach reduces HVAC energy use by 19.8% compared to the DQN-based HVAC-only control and by 40.28% relative to conventional rule-based control (RBC). In comparison with leading multi-agent deep reinforcement learning (MADRL) algorithms, including MADQN, VDN, and MAPPO, the framework reduces HVAC energy consumption by 1–5% and maintains a thermal comfort violation rate (TCVR) of less than 1% with an average temperature variation of 0.35 ∘C Moreover, the model demonstrates strong generalizability, achieving 16.58–58.12% energy savings across six distinct climatic regions—ranging from tropical (Singapore) to temperate (Beijing)—with up to 48.2% savings observed in Chengdu. Our framework indicates the potential of coordinating HVAC systems with ECWs in simulation, while also identifying limitations that need to be addressed for real-world deployment.
Chen et al. (Sun,) studied this question.