skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Advanced Thread Synchronization for Multithreaded MPI Implementations

Authors:
; ; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22)
OSTI Identifier:
1364660
DOE Contract Number:
AC02-06CH11357
Resource Type:
Conference
Resource Relation:
Conference: 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 05/14/17 - 05/17/17, Madrid, ES
Country of Publication:
United States
Language:
English

Citation Formats

Dang, Hoang-Vu, Seo, Sangmin, Amer, Abdelhalim, and Balaji, Pavan. Advanced Thread Synchronization for Multithreaded MPI Implementations. United States: N. p., 2017. Web.
Dang, Hoang-Vu, Seo, Sangmin, Amer, Abdelhalim, & Balaji, Pavan. Advanced Thread Synchronization for Multithreaded MPI Implementations. United States.
Dang, Hoang-Vu, Seo, Sangmin, Amer, Abdelhalim, and Balaji, Pavan. Sun . "Advanced Thread Synchronization for Multithreaded MPI Implementations". United States. doi:. https://www.osti.gov/servlets/purl/1364660.
@article{osti_1364660,
title = {Advanced Thread Synchronization for Multithreaded MPI Implementations},
author = {Dang, Hoang-Vu and Seo, Sangmin and Amer, Abdelhalim and Balaji, Pavan},
abstractNote = {},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Sun May 14 00:00:00 EDT 2017},
month = {Sun May 14 00:00:00 EDT 2017}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor.more » For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
  • To make the most effective use of parallel machines that are being built out of increasingly large multicore chips, researchers are exploring the use of programming models comprising a mixture of MPI and threads. Such hybrid models require efficient support from an MPI implementation for MPI messages sent from multiple threads simultaneously. In this paper, we explore the issues involved in designing such an implementation. We present four approaches to building a fully thread-safe MPI implementation, with decreasing levels of critical-section granularity (from coarse-grain locks to fine-grain locks to lock-free operations) and correspondingly increasing levels of complexity. We describe howmore » we have structured our implementation to support all four approaches and enable one to be selected at build time. We present performance results with a message-rate benchmark to demonstrate the performance implications of the different approaches.« less
  • Presently, different MPI implementations cannot interoperate with each other. In order to do distributed computing across different vendors` machines now requires that a single MPI implementation, such as MPICH, be used rather than the vendors own optimized MPI implementations. This talk describes a software package called PVMPI the authors are developing that allows interoperability of vendors` optimized MPI versions. Their approach builds on the proven and widely ported Parallel Virtual Machine. The use of PVMPI is transparent to MPI applications and allows intercommunication via all the MPI point-to-point calls. PVMPI allows more flexible control over MPI applications than is currentlymore » indicated by the MPI-2 forum by providing access to all the process control and resource control functions available in the PVM virtual machine.« less
  • As parallel systems are commonly being built out of increasingly large multicore chips, application programmers are exploring the use of hybrid programming models combining MPI across nodes and multithreading within a node. Many MPI implementations, however, are just starting to support multithreaded MPI communication, often focussing on correctness first and performance later. As a result, both users and implementers need some measure for evaluating the multithreaded performance of an MPI implementation. In this paper, we propose a number of performance tests that are motivated by typical application scenarios. These tests cover the overhead of providing the MPI{_}THREAD{_}MULTIPLE level of threadmore » safety for user programs, the amount of concurrency in different threads making MPI calls, the ability to overlap communication with computation, and other features. We present performance results with this test suite on several platforms (Linux cluster, Sun and IBM SMPs) and MPI implementations (MPICH2, Open MPI, IBM, and Sun).« less