Adaptive Policy Regularization for Offline-to-Online Reinforcement Learning in HVAC Control (Papers Track)
Hsin-Yu Liu (University of California San Diego); Bharathan Balaji (Amazon); Rajesh Gupta (UC San Diego); Dezhi Hong (Amazon)
Abstract
Reinforcement learning (RL)-based control methods have been extensively studied to improve building heating, ventilation, and air conditioning (HVAC) efficiency. Data-driven approaches demonstrate better transferability and scalability, making them useful in real-world applications. Most prior works focus on online learning requiring simulators or models of environment dynamics. However, transferring thermal simulators between environments is inefficient. We build on recent works that employ offline training on static datasets from unknown policies. Pure offline RL is constrained by the replay buffer's distribution, we propose using offline-to-online RL to enhance pre-trained offline models through online adaptation to distribution shifts. We show that direct online fine-tuning deteriorates performance on offline policies. To address this, we propose automatically tuning the actor's regularization during training to optimize the exploration-exploitation tradeoff. Specifically, we leverage simple moving averages of mean Q-values sampled throughout training. Simulation experiments demonstrate our method outperforms state-of-the-art approaches under various conditions, improving performance by 32.9% and enhancing pre-trained models' capabilities online.