Enhancing Sustainability in Liquid-Cooled Data Centers with Reinforcement Learning Control

Naug, Avisek; Guillen-Perez, Antonio; Gundecha, Vineet; Luna Gutierrez, Ricardo; Ramesh Babu, Ashwin; Mousavi, Sajad; Faraboschi, Paolo; Bash, Cullen; Sarkar, Soumyendu

Enhancing Sustainability in Liquid-Cooled Data Centers with Reinforcement Learning Control (Papers Track)

Avisek Naug (Hewlett Packard Enterprise); Antonio Guillen-Perez (Hewlett Packard Enterprise); Vineet Gundecha (Hewlett Packard Enterpise); Ricardo Luna Gutierrez (Hewlett Packard Enterprise); Ashwin Ramesh Babu (Hewlett Packard Enterprise Labs); Sajad Mousavi (Hewlett Packard Enterprise); Paolo Faraboschi (Hewlett Packard Enterprise); Cullen Bash (HPE); Soumyendu Sarkar (Hewlett Packard Enterprise)

Paper PDF Slides PDF Poster File NeurIPS 2024 Recorded Talk Cite

Power & Energy Reinforcement Learning

Abstract

The growing energy demands of machine learning workloads require sustainable data centers with lower carbon footprints and reduced energy consumption. Supercomputing and many high-performance computing (HPC) data centers, which use liquid cooling for greater efficiency than traditional air cooling systems, can significantly benefit from advanced optimization techniques to control liquid cooling. We present RL-LC, a novel Reinforcement Learning (RL) based approach designed to enhance the efficiency of liquid cooling in these environments. RL-LC integrates a customizable analytical liquid cooling model suitable for simulations or digital twins of data centers, focusing on minimizing energy consumption and carbon emissions. Our method achieves an average reduction of approximately 4% compared to industry-standard ASHRAE guidelines, contributing to more sustainable data center management and offering valuable insights for reducing the environmental impact of HPC operations.