skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Automatic translation of MPI source into a latency-tolerant, data-driven form

Abstract

Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework that can realize communication overlap in applications written in MPI without the need to intrusively modify the source code. We reformulate MPI source into a task dependency graph representation, which partially orders the tasks, enabling the program to execute in a data-driven fashion under the control of an external runtime system. Experimental results demonstrate that Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotation for a variety of applications and platforms, including those employing co-processors and accelerators. Moreover, Bamboo’s performance meets or exceeds that of labor-intensive hand coding. As a result, the translator is more than a means of hiding communication costs automatically; it demonstrates the utility of semantic level optimization against a well-known library.

Authors:
 [1];  [1];  [2];  [3];  [1]
  1. Univ. of California, San Diego, La Jolla, CA (United States)
  2. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
  3. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1347966
Report Number(s):
PNNL-SA-124997
Journal ID: ISSN 0743-7315; PII: S0743731517300771; TRN: US1700647
Grant/Contract Number:  
AC057601830; ER08-191010356-46564-95715; FC02-12ER26118; AC05-76RL01830; OCI-1053575
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Journal of Parallel and Distributed Computing
Additional Journal Information:
Journal Volume: 106; Journal Issue: C; Journal ID: ISSN 0743-7315
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; automatic communication hiding; source-to-source translator; task dependency graph; data-driven execution

Citation Formats

Nguyen, Tan, Cicotti, Pietro, Bylaska, Eric, Quinlan, Dan, and Baden, Scott. Automatic translation of MPI source into a latency-tolerant, data-driven form. United States: N. p., 2017. Web. doi:10.1016/J.JPDC.2017.02.009.
Nguyen, Tan, Cicotti, Pietro, Bylaska, Eric, Quinlan, Dan, & Baden, Scott. Automatic translation of MPI source into a latency-tolerant, data-driven form. United States. doi:10.1016/J.JPDC.2017.02.009.
Nguyen, Tan, Cicotti, Pietro, Bylaska, Eric, Quinlan, Dan, and Baden, Scott. Mon . "Automatic translation of MPI source into a latency-tolerant, data-driven form". United States. doi:10.1016/J.JPDC.2017.02.009. https://www.osti.gov/servlets/purl/1347966.
@article{osti_1347966,
title = {Automatic translation of MPI source into a latency-tolerant, data-driven form},
author = {Nguyen, Tan and Cicotti, Pietro and Bylaska, Eric and Quinlan, Dan and Baden, Scott},
abstractNote = {Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework that can realize communication overlap in applications written in MPI without the need to intrusively modify the source code. We reformulate MPI source into a task dependency graph representation, which partially orders the tasks, enabling the program to execute in a data-driven fashion under the control of an external runtime system. Experimental results demonstrate that Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotation for a variety of applications and platforms, including those employing co-processors and accelerators. Moreover, Bamboo’s performance meets or exceeds that of labor-intensive hand coding. As a result, the translator is more than a means of hiding communication costs automatically; it demonstrates the utility of semantic level optimization against a well-known library.},
doi = {10.1016/J.JPDC.2017.02.009},
journal = {Journal of Parallel and Distributed Computing},
number = C,
volume = 106,
place = {United States},
year = {Mon Mar 06 00:00:00 EST 2017},
month = {Mon Mar 06 00:00:00 EST 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share: