skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Enabling One-Sided Communication Semantics on ARM

 [1]; ORCiD logo [2];  [3]
  1. ARM Research
  2. ORNL
  3. Mellanox Technologies, Inc.
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
DOE Contract Number:
Resource Type:
Resource Relation:
Conference: Parallel and Distributed Processing Symposium Workshops - Lake Buena Vista, Florida, United States of America - 5/29/2017 4:00:00 AM-
Country of Publication:
United States

Citation Formats

Shamis, Pavel, Lopez, Matthew Graham, and Shainer, Gilad. Enabling One-Sided Communication Semantics on ARM. United States: N. p., 2017. Web. doi:10.1109/IPDPSW.2017.62.
Shamis, Pavel, Lopez, Matthew Graham, & Shainer, Gilad. Enabling One-Sided Communication Semantics on ARM. United States. doi:10.1109/IPDPSW.2017.62.
Shamis, Pavel, Lopez, Matthew Graham, and Shainer, Gilad. Mon . "Enabling One-Sided Communication Semantics on ARM". United States. doi:10.1109/IPDPSW.2017.62.
title = {Enabling One-Sided Communication Semantics on ARM},
author = {Shamis, Pavel and Lopez, Matthew Graham and Shainer, Gilad},
abstractNote = {},
doi = {10.1109/IPDPSW.2017.62},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon May 01 00:00:00 EDT 2017},
month = {Mon May 01 00:00:00 EDT 2017}

Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • The Cray Gemini interconnect hardware provides multiple transfer mechanisms and out-of-order message delivery to improve communication throughput. In this paper we quantify the performance of one-sided and two-sided communication paradigms with respect to: 1) the optimal available hardware transfer mechanism, 2) message ordering constraints, 3) per node and per core message concurrency. In addition to using Cray native communication APIs, we use UPC and MPI micro-benchmarks to capture one- and two-sided semantics respectively. Our results indicate that relaxing the message delivery order can improve performance up to 4.6x when compared with strict ordering. When hardware allows it, high-level one-sided programmingmore » models can already take advantage of message reordering. Enforcing the ordering semantics of two-sided communication comes with a performance penalty. Furthermore, we argue that exposing out-of-order delivery at the application level is required for the next-generation programming models. Any ordering constraints in the language specifications reduce communication performance for small messages and increase the number of active cores required for peak throughput.« less
  • In earlier work, we showed that the one-sided communication model found in PGAS languages (such as UPC) offers significant advantages in communication efficiency by decoupling data transfer from processor synchronization. We explore the use of the PGAS model on IBM Blue-Gene/P, an architecture that combines low-power, quad-core processors with extreme scalability. We demonstrate that the PGAS model, using a new port of the Berkeley UPC compiler and GASNet one-sided communication layer, outperforms two-sided (MPI) communication in both microbenchmarks and a case study of the communication-limited benchmark, NAS FT. We scale the benchmark up to 16,384 cores of the BlueGene/P andmore » demonstrate that UPC consistently outperforms MPI by as much as 66% for some processor configurations and an average of 32%. Additionally, the results demonstrate the scalability of the PGAS model and the Berkeley implementation of UPC, the viability of using it on machines with multicore nodes, and the effectiveness of the BG/P communication layer for supporting one-sided communication and PGAS languages.« less
  • This paper discusses the design and implementation of a one-sided communication interface for the IBM Blue Gene/L supercomputer. This interface facilitates ARMCI and the Global Arrays toolkit and can be used by other one-sided communication libraries. New protocols, interrupt driven communication, and compute node kernel enhancements were required to enable these libraries. Three possible methods for enabling ARMCI on the Blue Gene/L software stack are discussed. A detailed look into the development process shows how the implementation of the one-sided communication interface was completed. This was accomplished on a compressed time scale with the collaboration of various organizations within IBMmore » and open source communities. In addition to enabling the one-sided libraries, bandwidth enhancements were made for communication along a diagonal on the Blue Gene/L torus network. The maximum bandwidth improved by a factor of three. This work will enable a variety of one-sided applications to run on Blue Gene/L.« less
  • No abstract prepared.
  • No abstract prepared.