skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Revealing the performance of MPI RMA implementations.

Abstract

The MPI remote-memory access (RMA) operations provide a different programming model from the regular MPI-1 point-to-point operations. This model is particularly appropriate for cases where there are multiple communication events for each synchronization and where the target memory locations are known by the source processes. In this paper, we describe a benchmark designed to illustrate the performance of RMA with multiple RMA operations for each synchronization, as compared with point-to-point communication. We measured the performance of this benchmark on several platforms (SGI Altix, Sun Fire, IBM SMP, Linux cluster) and MPI implementations (SGI, Sun, IBM, MPICH2, Open MPI). We also investigated the effectiveness of the various optimization options specified by the MPI standard. Our results show that MPI RMA can provide substantially higher performance than point-to-point communication on some platforms, such as SGI Altix and Sun Fire. The results also show that many opportunities still exist for performance improvements in the implementation of MPI RMA.

Authors:
; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
973468
Report Number(s):
ANL/MCS/CP-59400
TRN: US201006%%415
DOE Contract Number:
DE-AC02-06CH11357
Resource Type:
Conference
Resource Relation:
Journal Name: Lect. Notes Comput. Sci.; Journal Volume: 4757; Journal Issue: 2007; Conference: EuroPVM/MPI 2007; Sep. 30, 2007 - Oct. 3, 2007; Paris, France
Country of Publication:
United States
Language:
ENGLISH
Subject:
97 MATHEMATICAL METHODS AND COMPUTING; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; BENCHMARKS; COMPUTER NETWORKS; PERFORMANCE; SYNCHRONIZATION; REMOTE CONTROL; MEMORY DEVICES

Citation Formats

Gropp, W. D., Thakur, R., and Mathematics and Computer Science. Revealing the performance of MPI RMA implementations.. United States: N. p., 2007. Web. doi:10.1007/978-3-540-75416-9_38.
Gropp, W. D., Thakur, R., & Mathematics and Computer Science. Revealing the performance of MPI RMA implementations.. United States. doi:10.1007/978-3-540-75416-9_38.
Gropp, W. D., Thakur, R., and Mathematics and Computer Science. Mon . "Revealing the performance of MPI RMA implementations.". United States. doi:10.1007/978-3-540-75416-9_38.
@article{osti_973468,
title = {Revealing the performance of MPI RMA implementations.},
author = {Gropp, W. D. and Thakur, R. and Mathematics and Computer Science},
abstractNote = {The MPI remote-memory access (RMA) operations provide a different programming model from the regular MPI-1 point-to-point operations. This model is particularly appropriate for cases where there are multiple communication events for each synchronization and where the target memory locations are known by the source processes. In this paper, we describe a benchmark designed to illustrate the performance of RMA with multiple RMA operations for each synchronization, as compared with point-to-point communication. We measured the performance of this benchmark on several platforms (SGI Altix, Sun Fire, IBM SMP, Linux cluster) and MPI implementations (SGI, Sun, IBM, MPICH2, Open MPI). We also investigated the effectiveness of the various optimization options specified by the MPI standard. Our results show that MPI RMA can provide substantially higher performance than point-to-point communication on some platforms, such as SGI Altix and Sun Fire. The results also show that many opportunities still exist for performance improvements in the implementation of MPI RMA.},
doi = {10.1007/978-3-540-75416-9_38},
journal = {Lect. Notes Comput. Sci.},
number = 2007,
volume = 4757,
place = {United States},
year = {Mon Jan 01 00:00:00 EST 2007},
month = {Mon Jan 01 00:00:00 EST 2007}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Abstract not provided.
  • The MPI-2 Standard, released in 1997, defined an interface for one-sided communication, also known as remote memory access (RMA). It was designed with the goal that it should permit efficient implementations on multiple platforms and networking technologies, and also in heterogeneous environments and non-cache-coherent systems. Nonetheless, even 12 years after its existence, the MPI-2 RMA interface remains scarcely used for a number of reasons. This paper discusses the limitations of the MPI-2 RMA specification, outlines the goals and requirements for a new RMA API that would better meet the needs of both users and implementers, and presents a strawman proposalmore » for such an API. We also study the tradeoffs facing the design of this new API and discuss how it may be implemented efficiently on both cache-coherent and non-cache-coherent systems.« less
  • As parallel systems are commonly being built out of increasingly large multicore chips, application programmers are exploring the use of hybrid programming models combining MPI across nodes and multithreading within a node. Many MPI implementations, however, are just starting to support multithreaded MPI communication, often focussing on correctness first and performance later. As a result, both users and implementers need some measure for evaluating the multithreaded performance of an MPI implementation. In this paper, we propose a number of performance tests that are motivated by typical application scenarios. These tests cover the overhead of providing the MPI{_}THREAD{_}MULTIPLE level of threadmore » safety for user programs, the amount of concurrency in different threads making MPI calls, the ability to overlap communication with computation, and other features. We present performance results with this test suite on several platforms (Linux cluster, Sun and IBM SMPs) and MPI implementations (MPICH2, Open MPI, IBM, and Sun).« less
  • The building blocks of emerging Petascale massively parallel processing (MPP) systems are multi-core processors with four or more cores as a single processing element and a customized network interface. The resulting memory and communication hierarchy of these platforms are now exposed to application developers and end users by creating a hierarchical or multi-core aware message-passing (MPI) programming interface and by providing a handful of runtime, tunable parameters that allows mapping and control of MPI tasks and message handling. We characterize performance of MPI communication patterns and present strategies for optimizing applications performance on Cray XT series systems that are composedmore » of contemporary AMD processors and a proprietary network infrastructure. We highlight dependencies in its memory and network subsystems, which could influence production-level applications performance. We demonstrate that MPI micro-benchmarks could mislead an application developer or end user since these benchmarks often do not expose the interplay between memory allocation and usage in the user space, which depends on the number of tasks or cores and workload characteristics. Our studies show performance improvements compared to the default options for our target scientific benchmarks and production-level applications.« less