skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems

Abstract

Global Address Space (GAS) programming models enable a convenient, shared-memory style addressing model. Typically this is realized through one-sided operations that can enable asynchronous communication and data movement. With the size of petascale systems reaching 10,000s of nodes and 100,000s of cores, the underlying runtime systems face critical challenges in (1) scalably managing resources (such as memory for communication buffers), and (2) gracefully handling unpredictable communication patterns and any associated contention. For any solution that addresses these resource scalability challenges, equally important is the need to maintain the performance of GAS programming models. In this paper, we describe a Hierarchical COOperation (HiCOO) architecture for scalable communication in GAS programming models. HiCOO formulates a cooperative communication architecture: with inter-node cooperation amongst multiple nodes (a.k.a multinode) and hierarchical cooperation among multinodes that are arranged in various virtual topologies. We have implemented HiCOO for a popular GAS runtime library, Aggregate Remote Memory Copy Interface (ARMCI). By extensively evaluating different virtual topologies in HiCOO in terms of their impact to memory scalability, network contention, and application performance, we identify MFCG as the most suitable virtual topology. The resulting HiCOO architecture is able to realize scalable resource management and achieve resilience to network contention, whilemore » at the same time maintaining or enhancing the performance of scientific applications. In one case, it reduces the total execution time of an NWChem application by 52%.« less

Authors:
; ; ;
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); UT-Battelle LLC/ORNL, Oak Ridge, TN (Unted States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1564969
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Journal Article
Journal Name:
Journal of Parallel and Distributed Computing
Additional Journal Information:
Journal Volume: 72; Journal Issue: 11; Journal ID: ISSN 0743-7315
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
Computer Science

Citation Formats

Yu, Weikuan, Que, Xinyu, Tipparaju, Vinod, and Vetter, Jeffrey S. HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems. United States: N. p., 2012. Web. doi:10.1016/j.jpdc.2012.01.022.
Yu, Weikuan, Que, Xinyu, Tipparaju, Vinod, & Vetter, Jeffrey S. HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems. United States. doi:10.1016/j.jpdc.2012.01.022.
Yu, Weikuan, Que, Xinyu, Tipparaju, Vinod, and Vetter, Jeffrey S. Thu . "HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems". United States. doi:10.1016/j.jpdc.2012.01.022.
@article{osti_1564969,
title = {HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems},
author = {Yu, Weikuan and Que, Xinyu and Tipparaju, Vinod and Vetter, Jeffrey S.},
abstractNote = {Global Address Space (GAS) programming models enable a convenient, shared-memory style addressing model. Typically this is realized through one-sided operations that can enable asynchronous communication and data movement. With the size of petascale systems reaching 10,000s of nodes and 100,000s of cores, the underlying runtime systems face critical challenges in (1) scalably managing resources (such as memory for communication buffers), and (2) gracefully handling unpredictable communication patterns and any associated contention. For any solution that addresses these resource scalability challenges, equally important is the need to maintain the performance of GAS programming models. In this paper, we describe a Hierarchical COOperation (HiCOO) architecture for scalable communication in GAS programming models. HiCOO formulates a cooperative communication architecture: with inter-node cooperation amongst multiple nodes (a.k.a multinode) and hierarchical cooperation among multinodes that are arranged in various virtual topologies. We have implemented HiCOO for a popular GAS runtime library, Aggregate Remote Memory Copy Interface (ARMCI). By extensively evaluating different virtual topologies in HiCOO in terms of their impact to memory scalability, network contention, and application performance, we identify MFCG as the most suitable virtual topology. The resulting HiCOO architecture is able to realize scalable resource management and achieve resilience to network contention, while at the same time maintaining or enhancing the performance of scientific applications. In one case, it reduces the total execution time of an NWChem application by 52%.},
doi = {10.1016/j.jpdc.2012.01.022},
journal = {Journal of Parallel and Distributed Computing},
issn = {0743-7315},
number = 11,
volume = 72,
place = {United States},
year = {2012},
month = {11}
}