• Improving the Performance and Energy Efficiency for Power-constrained High Performance Computing

      He, Xubin; Kant, Krishna; Ji, Bo, 1982-; Xiao, Weijun; Zhao, Zhigen (Temple University. Libraries, 2017)
      The continuous growth in computing capability has expedited the scientific discovery and enabled scientific applications to simulate physical phenomena for increased problem sizes. However, as the computing capability escalates, power constraints are becoming a first-order concern for high performance computing (HPC) facilities. For example, the U.S. Department of Energy has set a power constraint of 20 MW to each exascale machine. How to achieve the target performance under power constraints remains to be an issue. Therefore, efficient operation of these facilities requires power constraints to be taken into account at all layers, which potentially impacts the performance and energy efficiency. In order to improve the performance and energy efficiency for computing and storage resources under power constraints, I proposed the following three techniques. First, I developed a power-aware checkpointing model through exploring the interplay among power capping, temperature, reliability, performance, and energy efficiency. Applying the model leads to maximized performance and energy efficiency, and minimized data movements over storage systems. Second, I characterized the performance and energy efficiency of HPC workflows on heterogeneous processors. In addition, I also characterized how scientific simulation and analysis react to power capping differently and how they vary based on error resilience. Based on the characterization of HPC workflows, I developed a reliability-aware platform configuration model to determine the optimal platform configuration which includes power allocation and distribution, power capping levels, and computing scales for power-constrained HPC workflows. Third, I developed a proactive burst buffer draining scheme to minimize the I/O provisioning requirement of permanent storage systems while preserving the system I/O performance. Facing power constraints, reducing the storage provisioning level directly decreases the power consumption of storage systems. Applying the proactive burst buffer draining scheme minimizes the storage provisioning level and power consumption without compromising the storage I/O performance.