skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An Evaluation of Two Implementation Strategies for Optimizing One-Sided Atomic Reduction

Conference ·

Traditionally, user-level message-passing libraries (e.g., MPI, PVM) offered only a limited set of operations that involved computation in addition to communication. They are collective operations such as reductions (e.g., MPI_Reduce, MPI_Allreduce) that combine the data in the user communication buffer across the set of tasks participating in the operation. These operations are often used in scientific computing [1] to, for example, determine convergence criteria for the iterative methods for solving linear equations or compute vector dot products in the conjugate gradient solver [2]. Consecutively, multiple research efforts have been pursued to optimize performance of these important operations on modern networks. A wide range of implementation protocols and techniques such as shared memory, RMA (remote memory access), and the programmable network interface card (NIC) has been explored e.g., [2,3,4]. The most recent extensions to the MPI standard [5] define atomic reductions, one of the one-sided operations available in MPI-2. In MPI-2, atomic reductions are supported through the MPI_Accumulate operation. This noncollective one-sided operation in a single interface combines communication and computations. It allows the programmer to update atomically remote memory by combining the content of the local communication buffer with the remote memory buffer. The primary difference between atomic one-sided and collective reductions is that in the first case only one processor is involved in the operation and the operation is atomic, which allows multiple processors to independently update the same remote memory location without explicit synchronization that otherwise would be required to ensure consistency of the result. The sample application domain that motivated MPI Forum to add atomic reduction to the MPI-2 standard has been electronic structure computational chemistry with multiple algorithms that relied on the accumulate operation as available in the Global Arrays toolkit [6].

Research Organization:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
914700
Report Number(s):
PNNL-SA-43827; KJ0101030; TRN: US200812%%27
Resource Relation:
Conference: 19th IEEE International Parallel & Distributed Processing Symposium
Country of Publication:
United States
Language:
English