skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Evaluating Contexts in OpenSHMEM-X Reference Implementation

 [1]; ORCiD logo [2]; ORCiD logo [2]; ORCiD logo [2]; ORCiD logo [2]
  1. The University of Tennessee, Knoxville
  2. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
OSTI Identifier:
DOE Contract Number:
Resource Type:
Resource Relation:
Conference: Fourth OpenSHMEM Workshop - Annapolis, Virginia, United States of America - 8/7/2017 12:00:00 AM-8/9/2017 12:00:00 AM
Country of Publication:
United States

Citation Formats

Bouteiller, Aurelien, Gorentla Venkata, Manjunath, Baker, Matthew B., Boehm, Swen, and Pophale, Swaroop S.. Evaluating Contexts in OpenSHMEM-X Reference Implementation. United States: N. p., 2017. Web.
Bouteiller, Aurelien, Gorentla Venkata, Manjunath, Baker, Matthew B., Boehm, Swen, & Pophale, Swaroop S.. Evaluating Contexts in OpenSHMEM-X Reference Implementation. United States.
Bouteiller, Aurelien, Gorentla Venkata, Manjunath, Baker, Matthew B., Boehm, Swen, and Pophale, Swaroop S.. 2017. "Evaluating Contexts in OpenSHMEM-X Reference Implementation". United States. doi:.
title = {Evaluating Contexts in OpenSHMEM-X Reference Implementation},
author = {Bouteiller, Aurelien and Gorentla Venkata, Manjunath and Baker, Matthew B. and Boehm, Swen and Pophale, Swaroop S.},
abstractNote = {},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2017,
month = 8

Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • The OpenSHMEM Library Specification has evolved consid- erably since version 1.0. Recently, non-blocking implicit Remote Memory Access (RMA) operations were introduced in OpenSHMEM 1.3. These provide a way to achieve better overlap between communication and computation. However, the implicit non-blocking operations do not pro- vide a separate handle to track and complete the individual RMA opera- tions. They are guaranteed to be completed after either a shmem quiet(), shmem barrier() or a shmem barrier all() is called. These are global com- pletion and synchronization operations. Though this semantic is expected to achieve a higher message rate for the applications, themore » drawback is that it does not allow fine-grained control over the completion of RMA operations. In this paper, first, we introduce non-blocking RMA operations with requests, where each operation has an explicit request to track and com- plete the operation. Second, we introduce interfaces to merge multiple requests into a single request handle. The merged request tracks multiple user-selected RMA operations, which provides the flexibility of tracking related communication operations with one request handle. Lastly, we explore the implications in terms of performance, productivity, usability and the possibility of defining different patterns of communication via merging of requests. Our experimental results show that a well designed and implemented OpenSHMEM stack can hide the overhead of allocating and managing the requests. The latency of RMA operations with requests is similar to blocking and implicit non-blocking RMA operations. We test our implementation with the Scalable Synthetic Compact Applications (SSCA #1) benchmark and observe that using RMA operations with requests and merging of these requests outperform the implementation using blocking RMA operations and implicit non-blocking operations by 49% and 74% respectively.« less
  • Abstract not provided.
  • OpenSHMEM is an effort to standardize the well-known SHMEM parallel programming library. The project aims to produce an open-source and portable SHMEM API and is led by ORNL and UH. In this paper, we optimize the current OpenSHMEM reference implementa- tion, based on GASNet, to achieve higher performance characteristics. To achieve these desired performance characteristics, we have redesigned an important component of the OpenSHMEM implementation, the network layer, to leverage a low-level communication library designed for imple- menting parallel programming models called UCCS. In particular, UCCS provides an interface and semantics such as native atomic operations and remote memory operationsmore » to better support PGAS programming models, including OpenSHMEM. Through the use of microbenchmarks, we evaluate this new OpenSHMEM implementation on various network metrics, including the latency of point-to-point and collective operations. Furthermore, we compare the performance of our OpenSHMEM imple- mentation with the state-of-the-art SGI SHMEM. Our results show that the atomic operations of our OpenSHMEM implementation outperform SGI s SHMEM implementation by 3%. Its RMA operations outperform both SGI s SHMEM and the original OpenSHMEM reference implemen- tation by as much as 18% and 12% for gets, and as much as 83% and 53% for puts.« less
  • We describe the effort to implement the HPCG benchmark using OpenSHMEM and MPI one-sided communication. Unlike the High Performance LINPACK (HPL) benchmark that places em- phasis on large dense matrix computations, the HPCG benchmark is dominated by sparse operations such as sparse matrix-vector product, sparse matrix triangular solve, and long vector operations. The MPI one-sided implementation is developed using the one-sided OpenSHMEM implementation. Pre- liminary results comparing the original MPI, OpenSHMEM, and MPI one-sided implementations on an SGI cluster, Cray XK7 and Cray XC30 are presented. The results suggest the MPI, OpenSHMEM, and MPI one-sided implementations all obtain similar overallmore » performance but the MPI one-sided im- plementation seems to slightly increase the run time for multigrid preconditioning in HPCG on the Cray XK7 and Cray XC30.« less