skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Automatic translation of MPI source into a latency-tolerant, data-driven form

Abstract

Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework that can realize communication overlap in applications written in MPI without the need to intrusively modify the source code. We reformulate MPI source into a task dependency graph representation, which partially orders the tasks, enabling the program to execute in a data-driven fashion under the control of an external runtime system. Experimental results demonstrate that Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotation for a variety of applications and platforms, including those employing co-processors and accelerators. Moreover, Bamboo’s performance meets or exceeds that of labor-intensive hand coding. As a result, the translator is more than a means of hiding communication costs automatically; it demonstrates the utility of semantic level optimization against a well-known library.

Authors:
 [1];  [1];  [2];  [3];  [1]
  1. Univ. of California, San Diego, La Jolla, CA (United States)
  2. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
  3. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1347966
Report Number(s):
PNNL-SA-124997
Journal ID: ISSN 0743-7315; PII: S0743731517300771; TRN: US1700647
Grant/Contract Number:
AC057601830; ER08-191010356-46564-95715; FC02-12ER26118; AC05-76RL01830; OCI-1053575
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Journal of Parallel and Distributed Computing
Additional Journal Information:
Journal Volume: 106; Journal Issue: C; Journal ID: ISSN 0743-7315
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; automatic communication hiding; source-to-source translator; task dependency graph; data-driven execution

Citation Formats

Nguyen, Tan, Cicotti, Pietro, Bylaska, Eric, Quinlan, Dan, and Baden, Scott. Automatic translation of MPI source into a latency-tolerant, data-driven form. United States: N. p., 2017. Web. doi:10.1016/J.JPDC.2017.02.009.
Nguyen, Tan, Cicotti, Pietro, Bylaska, Eric, Quinlan, Dan, & Baden, Scott. Automatic translation of MPI source into a latency-tolerant, data-driven form. United States. doi:10.1016/J.JPDC.2017.02.009.
Nguyen, Tan, Cicotti, Pietro, Bylaska, Eric, Quinlan, Dan, and Baden, Scott. Mon . "Automatic translation of MPI source into a latency-tolerant, data-driven form". United States. doi:10.1016/J.JPDC.2017.02.009. https://www.osti.gov/servlets/purl/1347966.
@article{osti_1347966,
title = {Automatic translation of MPI source into a latency-tolerant, data-driven form},
author = {Nguyen, Tan and Cicotti, Pietro and Bylaska, Eric and Quinlan, Dan and Baden, Scott},
abstractNote = {Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework that can realize communication overlap in applications written in MPI without the need to intrusively modify the source code. We reformulate MPI source into a task dependency graph representation, which partially orders the tasks, enabling the program to execute in a data-driven fashion under the control of an external runtime system. Experimental results demonstrate that Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotation for a variety of applications and platforms, including those employing co-processors and accelerators. Moreover, Bamboo’s performance meets or exceeds that of labor-intensive hand coding. As a result, the translator is more than a means of hiding communication costs automatically; it demonstrates the utility of semantic level optimization against a well-known library.},
doi = {10.1016/J.JPDC.2017.02.009},
journal = {Journal of Parallel and Distributed Computing},
number = C,
volume = 106,
place = {United States},
year = {Mon Mar 06 00:00:00 EST 2017},
month = {Mon Mar 06 00:00:00 EST 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:
  • Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework that can realize communication overlap in applications written in MPI without the need to intrusively modify the source code. Bamboo reformulates MPI source into the form of a task dependency graph that expresses a partial ordering among tasks, enabling the program to execute in a data-driven fashion under the control of an external runtime system. Experimental results demonstrate that Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotationmore » for a variety of applications and platforms, including those employing co-processors and accelerators. Moreover, Bamboo's performance meets or exceeds that of labor-intensive hand coding. The translator is more than a means of hiding communication costs automatically; it demonstrates the utility of semantic level optimization against a wellknown library.« less
  • The goals of this project are to develop new, scalable, high-fidelity algorithms for atomic-level simulations and program transformations that automatically restructure existing applications, enabling them to scale forward to Petascale systems and beyond. The techniques enable legacy MPI application code to exploit greater parallelism though increased latency hiding and improved workload assignment. The techniques were successfully demonstrated on high-end scalable systems located at DOE laboratories. Besides the automatic MPI program transformations efforts, the project also developed several new scalable algorithms for ab-initio molecular dynamics, including new massively parallel algorithms for hybrid DFT and new parallel in time algorithms for molecularmore » dynamics and ab-initio molecular dynamics. These algorithms were shown to scale to very large number of cores, and they were designed to work in the latency hiding framework developed in this project. The effectiveness of the developments was enhanced by the direct application to real grand challenge simulation problems covering a wide range of technologically important applications, time scales and accuracies. These included the simulation of the electronic structure of mineral/fluid interfaces, the very accurate simulation of chemical reactions in microsolvated environments, and the simulation of chemical behavior in very large enzyme reactions.« less
  • No abstract prepared.