An Evaluation of Two Implementation Strategies for Optimizing One-Sided Atomic Reduction

Nieplocha, Jarek; Tipparaju, Vinod; Apra, Edoardo

doi:10.1109/IPDPS.2005.96

Title: An Evaluation of Two Implementation Strategies for Optimizing One-Sided Atomic Reduction

Conference · Fri Apr 08 00:00:00 EDT 2005

DOI:https://doi.org/10.1109/IPDPS.2005.96· OSTI ID:914700

Nieplocha, Jarek; Tipparaju, Vinod; Apra, Edoardo

Traditionally, user-level message-passing libraries (e.g., MPI, PVM) offered only a limited set of operations that involved computation in addition to communication. They are collective operations such as reductions (e.g., MPI_Reduce, MPI_Allreduce) that combine the data in the user communication buffer across the set of tasks participating in the operation. These operations are often used in scientific computing [1] to, for example, determine convergence criteria for the iterative methods for solving linear equations or compute vector dot products in the conjugate gradient solver [2]. Consecutively, multiple research efforts have been pursued to optimize performance of these important operations on modern networks. A wide range of implementation protocols and techniques such as shared memory, RMA (remote memory access), and the programmable network interface card (NIC) has been explored e.g., [2,3,4]. The most recent extensions to the MPI standard [5] define atomic reductions, one of the one-sided operations available in MPI-2. In MPI-2, atomic reductions are supported through the MPI_Accumulate operation. This noncollective one-sided operation in a single interface combines communication and computations. It allows the programmer to update atomically remote memory by combining the content of the local communication buffer with the remote memory buffer. The primary difference between atomic one-sided and collective reductions is that in the first case only one processor is involved in the operation and the operation is atomic, which allows multiple processors to independently update the same remote memory location without explicit synchronization that otherwise would be required to ensure consistency of the result. The sample application domain that motivated MPI Forum to add atomic reduction to the MPI-2 standard has been electronic structure computational chemistry with multiple algorithms that relied on the accumulate operation as available in the Global Arrays toolkit [6].

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 914700

Report Number(s):: PNNL-SA-43827; KJ0101030; TRN: US200812%%27

Resource Relation:: Conference: 19th IEEE International Parallel & Distributed Processing Symposium

Country of Publication:: United States

Language:: English

Similar Records

On the Suitability of MPI as a PGAS Runtime

Conference · Thu Dec 18 00:00:00 EST 2014 · OSTI ID:914700

Daily, Jeffrey A.; Vishnu, Abhinav; Palmer, Bruce J.; +2 more

Efficient On-demand Connection Management Mechanisms with PGAS Models on InfiniBand

Conference · Mon May 17 00:00:00 EDT 2010 · OSTI ID:914700

Vishnu, Abhinav; Krishnan, Manoj Kumar

SeaStar Unchained: Multiplying the Performance of the Cray SeaStar Network

Conference · Tue Jan 01 00:00:00 EST 2013 · OSTI ID:914700

Dillow, David A; Atchley, Scott

Related Subjects

97
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
COMPUTER NETWORKS
MEMORY MANAGEMENT
ALGORITHMS
IMPLEMENTATION
PERFORMANCE
Accumulate operation
high-performance networks

Title: An Evaluation of Two Implementation Strategies for Optimizing One-Sided Atomic Reduction

Citation Formats

Similar Records

Related Subjects