skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Measuring and tuning energy efficiency on large scale high performance computing platforms.

Technical Report ·
DOI:https://doi.org/10.2172/1035312· OSTI ID:1035312

Recognition of the importance of power in the field of High Performance Computing, whether it be as an obstacle, expense or design consideration, has never been greater and more pervasive. While research has been conducted on many related aspects, there is a stark absence of work focused on large scale High Performance Computing. Part of the reason is the lack of measurement capability currently available on small or large platforms. Typically, research is conducted using coarse methods of measurement such as inserting a power meter between the power source and the platform, or fine grained measurements using custom instrumented boards (with obvious limitations in scale). To collect the measurements necessary to analyze real scientific computing applications at large scale, an in-situ measurement capability must exist on a large scale capability class platform. In response to this challenge, we exploit the unique power measurement capabilities of the Cray XT architecture to gain an understanding of power use and the effects of tuning. We apply these capabilities at the operating system level by deterministically halting cores when idle. At the application level, we gain an understanding of the power requirements of a range of important DOE/NNSA production scientific computing applications running at large scale (thousands of nodes), while simultaneously collecting current and voltage measurements on the hosting nodes. We examine the effects of both CPU and network bandwidth tuning and demonstrate energy savings opportunities of up to 39% with little or no impact on run-time performance. Capturing scale effects in our experimental results was key. Our results provide strong evidence that next generation large-scale platforms should not only approach CPU frequency scaling differently, but could also benefit from the capability to tune other platform components, such as the network, to achieve energy efficient performance.

Research Organization:
Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC04-94AL85000
OSTI ID:
1035312
Report Number(s):
SAND2011-5702; TRN: US201205%%76
Country of Publication:
United States
Language:
English