skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A uGNI-based Asynchronous Message-driven Runtime System for Cray Supercomputers with Gemini Interconnect

Abstract

Gemini, the network for the new Cray XE/XK systems, features low latency, high bandwidth and strong scalability. Its hardware support for remote direct memory access enables efficient implementation of the global address space programming languages. Although the user Generic Network Interface (uGNI) provides a low-level interface for Gemini with support to the message-passing programming model (MPI), it remains challenging to port alternative programming models with scalable performance. CHARM++ is an object-oriented message-driven programming model. Its applications have been shown to scale up to the full Jaguar Cray XT machine. In this paper, we present an implementation of this programming model on uGNI for the Cray XE/XK systems. Several techniques are presented to exploit the uGNI capabilites by reducing memory copy and registration overhead, taking advantage of the persistent communication, and improving intra-node communication. Our microbenchmark results demonstrate that the uGNI-based runtime system outperforms the MPI-based implementation by up to 50% in terms of message latency. For communication intensive applications such as N-Queens, this implementation scales up to 15, 360 cores of a Cray XE6 machine and is 70% faster than the MPI-based implementation. In molecular dynamics application NAMD, the performance is also considerably improved by as much as 18%.

Authors:
; ; ; ;
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1567313
Resource Type:
Conference
Journal Name:
2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS)
Additional Journal Information:
Conference: 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), May 21-25, 2012, Shanghai, China
Country of Publication:
United States
Language:
English
Subject:
Computer Science; Engineering

Citation Formats

Sun, Yanhua, Zheng, Gengbin, Kale, Laximant V., Jones, Terry R., and Olson, Ryan. A uGNI-based Asynchronous Message-driven Runtime System for Cray Supercomputers with Gemini Interconnect. United States: N. p., 2012. Web. doi:10.1109/IPDPS.2012.127.
Sun, Yanhua, Zheng, Gengbin, Kale, Laximant V., Jones, Terry R., & Olson, Ryan. A uGNI-based Asynchronous Message-driven Runtime System for Cray Supercomputers with Gemini Interconnect. United States. doi:10.1109/IPDPS.2012.127.
Sun, Yanhua, Zheng, Gengbin, Kale, Laximant V., Jones, Terry R., and Olson, Ryan. Thu . "A uGNI-based Asynchronous Message-driven Runtime System for Cray Supercomputers with Gemini Interconnect". United States. doi:10.1109/IPDPS.2012.127.
@article{osti_1567313,
title = {A uGNI-based Asynchronous Message-driven Runtime System for Cray Supercomputers with Gemini Interconnect},
author = {Sun, Yanhua and Zheng, Gengbin and Kale, Laximant V. and Jones, Terry R. and Olson, Ryan},
abstractNote = {Gemini, the network for the new Cray XE/XK systems, features low latency, high bandwidth and strong scalability. Its hardware support for remote direct memory access enables efficient implementation of the global address space programming languages. Although the user Generic Network Interface (uGNI) provides a low-level interface for Gemini with support to the message-passing programming model (MPI), it remains challenging to port alternative programming models with scalable performance. CHARM++ is an object-oriented message-driven programming model. Its applications have been shown to scale up to the full Jaguar Cray XT machine. In this paper, we present an implementation of this programming model on uGNI for the Cray XE/XK systems. Several techniques are presented to exploit the uGNI capabilites by reducing memory copy and registration overhead, taking advantage of the persistent communication, and improving intra-node communication. Our microbenchmark results demonstrate that the uGNI-based runtime system outperforms the MPI-based implementation by up to 50% in terms of message latency. For communication intensive applications such as N-Queens, this implementation scales up to 15, 360 cores of a Cray XE6 machine and is 70% faster than the MPI-based implementation. In molecular dynamics application NAMD, the performance is also considerably improved by as much as 18%.},
doi = {10.1109/IPDPS.2012.127},
journal = {2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS)},
number = ,
volume = ,
place = {United States},
year = {2012},
month = {8}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: