Development and assessment of constrained reinforcement learning-based controller for building demand response

Sanchez, Jerson

View/Open

2023_Sanchez_Jerson_Thesis.pdf (8.941Mb)

2023_Sanchez_Jerson_Thesis.zip (14.41Mb)

Date

2023-12-15

Author

Sanchez, Jerson

Metadata

Show full item record

Abstract

Recent advancements in model-free control strategies, such as reinforcement learning (RL) have led to more practical and scalable solutions for building energy system controls These strategies do not require complex models of building dynamics and rely exclusively on data to learn the control policy. Applications of these techniques in heating, ventilation and air-conditioning (HVAC) systems are being studied under different operational scenarios, including demand response programs. Conventional (unconstrained) reinforcement learning controllers often address indoor comfort constraints by incorporating a comfort violation penalty in the reward function. While this approach can result in good performance in terms of energy cost, it often leads to significant constraint violations when a small penalty factor is used. On the other hand, effective enforcement of constraints can be achieved, but at the cost of economic performance degradation. Hence, a clear trade-off between economic performance and constraint satisfaction poses a challenge to overcome. Motivated by this challenge, this thesis presents a constrained RL-based control strategy for building demand response. The proposed strategy handles the constraints explicitly, avoiding the use of arbitrarily set penalty factors that can significantly impact control performance. To demonstrate its efficacy, simulation tests of the proposed strategy, as well as baseline model predictive controllers (MPC) and conventional (unconstrained) policy optimization methods, were conducted. The simulation tests showed that the constrained RL strategy achieved utility cost savings up to 16.1%, similar to the MPC baselines, without requiring any model of the building and with minimum constraint violation. In contrast, the unconstrained RL controllers led to either high utility costs or constraint violations, depending on the penalty factor setting.

URI

https://hdl.handle.net/11244/340052

Collections

OU - Theses [2188]

SHAREOK^TM

advancing Oklahoma scholarship, research and institutional memory