skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Exploiting communication concurrency on high performance computing systems

Abstract

Although logically available, applications may not exploit enough instantaneous communication concurrency to maximize hardware utilization on HPC systems. This is exacerbated in hybrid programming models such as SPMD+OpenMP. We present the design of a "multi-threaded" runtime able to transparently increase the instantaneous network concurrency and to provide near saturation bandwidth, independent of the application configuration and dynamic behavior. The runtime forwards communication requests from application level tasks to multiple communication servers. Our techniques alleviate the need for spatial and temporal application level message concurrency optimizations. Experimental results show improved message throughput and bandwidth by as much as 150% for 4KB bytes messages on InfiniBand and by as much as 120% for 4KB byte messages on Cray Aries. For more complex operations such as all-to-all collectives, we observe as much as 30% speedup. This translates into 23% speedup on 12,288 cores for a NAS FT implemented using FFTW. We also observe as much as 76% speedup on 1,500 cores for an already optimized UPC+OpenMP geometric multigrid application using hybrid parallelism.

Authors:
 [1];  [2];  [2];  [2]
  1. Univ. of Oregon, Eugene, OR (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1407278
DOE Contract Number:  
AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: Proceedings of the 6th International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2015), San Francisco, CA (United States), 7-8 Feb 2015
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Chaimov, Nicholas, Ibrahim, Khaled Z., Williams, Samuel, and Iancu, Costin. Exploiting communication concurrency on high performance computing systems. United States: N. p., 2015. Web. doi:10.1145/2712386.2712394.
Chaimov, Nicholas, Ibrahim, Khaled Z., Williams, Samuel, & Iancu, Costin. Exploiting communication concurrency on high performance computing systems. United States. doi:10.1145/2712386.2712394.
Chaimov, Nicholas, Ibrahim, Khaled Z., Williams, Samuel, and Iancu, Costin. Thu . "Exploiting communication concurrency on high performance computing systems". United States. doi:10.1145/2712386.2712394. https://www.osti.gov/servlets/purl/1407278.
@article{osti_1407278,
title = {Exploiting communication concurrency on high performance computing systems},
author = {Chaimov, Nicholas and Ibrahim, Khaled Z. and Williams, Samuel and Iancu, Costin},
abstractNote = {Although logically available, applications may not exploit enough instantaneous communication concurrency to maximize hardware utilization on HPC systems. This is exacerbated in hybrid programming models such as SPMD+OpenMP. We present the design of a "multi-threaded" runtime able to transparently increase the instantaneous network concurrency and to provide near saturation bandwidth, independent of the application configuration and dynamic behavior. The runtime forwards communication requests from application level tasks to multiple communication servers. Our techniques alleviate the need for spatial and temporal application level message concurrency optimizations. Experimental results show improved message throughput and bandwidth by as much as 150% for 4KB bytes messages on InfiniBand and by as much as 120% for 4KB byte messages on Cray Aries. For more complex operations such as all-to-all collectives, we observe as much as 30% speedup. This translates into 23% speedup on 12,288 cores for a NAS FT implemented using FFTW. We also observe as much as 76% speedup on 1,500 cores for an already optimized UPC+OpenMP geometric multigrid application using hybrid parallelism.},
doi = {10.1145/2712386.2712394},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Jan 01 00:00:00 EST 2015},
month = {Thu Jan 01 00:00:00 EST 2015}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: