skip to main content

SciTech ConnectSciTech Connect

Title: A Case for Application Oblivious Energy-Efficient MPI Runtime

Power has become the major impediment in designing large scale high-end systems. Message Passing Interface (MPI) is the {\em de facto} communication interface used as the back-end for designing applications, programming models and runtime for these systems. Slack --- the time spent by an MPI process in a single MPI call --- provides a potential for energy and power savings, if an appropriate power reduction technique such as core-idling/Dynamic Voltage and Frequency Scaling (DVFS) can be applied without perturbing application's execution time. Existing techniques that exploit slack for power savings assume that application behavior repeats across iterations/executions. However, an increasing use of adaptive, data-dependent workloads combined with system factors (OS noise, congestion) makes this assumption invalid. This paper proposes and implements Energy Aware MPI (EAM) --- an application-oblivious energy-efficient MPI runtime. EAM uses a combination of communication models of common MPI primitives (point-to-point, collective, progress, blocking/non-blocking) and an online observation of slack for maximizing energy efficiency. Each power lever incurs time overhead, which must be amortized over slack to minimize degradation. When predicted communication time exceeds a lever overhead, the lever is used {\em as soon as possible} --- to maximize energy efficiency. When mis-prediction occurs, the lever(s) are usedmore » automatically at specific intervals for amortization. We implement EAM using MVAPICH2 and evaluate it on ten applications using up to 4096 processes. Our performance evaluation on an InfiniBand cluster indicates that EAM can reduce energy consumption by 5--41\% in comparison to the default approach, with negligible (less than 4\% in all cases) performance loss.« less
; ; ; ; ; ;
Publication Date:
OSTI Identifier:
Report Number(s):
DOE Contract Number:
Resource Type:
Resource Relation:
Conference: SC15 Proceedings: International Conference on High Performance Computing, Networking, Storage and Analysis, November 15-20, 2015, Austin, Texas, Paper No. 29
Association of Computing Machinery (ACM), New York, NY, United States(US).
Research Org:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
Sponsoring Org:
Country of Publication:
United States