Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Tail queues: A multi-threaded matching architecture

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.5158· OSTI ID:1496973
 [1];  [1];  [1];  [2]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Univ. of New Mexico, Albuquerque, NM (United States)
  2. Univ. of New Mexico, Albuquerque, NM (United States)

As we approach exascale, computational parallelism will have to drastically increase in order to meet throughput targets. Many–core architectures have exacerbated this problem by trading reduced clock speeds, core complexity, and computation throughput for increasing parallelism. This presents two major challenges for communication libraries such as MPI: the library must leverage the performance advantages of thread level parallelism and avoid the scalability problems associated with increasing the number of processes to that scale. Hybrid programming models, such as MPI+X, have been proposed to address these challenges. MPI THREAD MULTIPLE is MPI's thread safe mode. While there has been work to optimize it, it largely remains non–performant in most implementations. While current applications avoid MPI multithreading due to performance concerns, it is expected to be utilized in future applications. One of the major synchronous data structures required by MPI is the matching engine. In this paper, we present a parallel matching algorithm that can improve MPI matching for multithreaded applications. We then perform a feasibility study to demonstrate the performance benefit of the technique.

Research Organization:
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC04-94AL85000
OSTI ID:
1496973
Alternate ID(s):
OSTI ID: 1511804
Report Number(s):
SAND--2019-1466J; 672473
Journal Information:
Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 3 Vol. 32; ISSN 1532-0626
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English

References (24)

Characterizing MPI matching via trace-based simulation journal September 2018
Improving concurrency and asynchrony in multithreaded MPI applications using software offloading
  • Vaidyanathan, Karthikeyan; Kalamkar, Dhiraj D.; Pamnany, Kiran
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807602
conference January 2015
Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming journal February 2010
Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications conference November 2014
Synchronization without contention journal April 1991
Characterizing MPI matching via trace-based simulation conference January 2017
Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor conference August 2015
Scalable parallel programming with CUDA introduction conference August 2008
Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints
  • Sridharan, Srinivas; Dinan, James; Kalamkar, Dhiraj D.
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.45
conference November 2014
Thread-safety in an MPI implementation: Requirements and analysis journal September 2007
Adaptive and Dynamic Design for MPI Tag Matching conference September 2016
MPI+Threads: runtime contention and remedies journal January 2015
MPI+Threads: runtime contention and remedies
  • Amer, Abdelhalim; Lu, Huiwei; Wei, Yanjie
  • PPoPP '15: 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/2688500.2688522
conference January 2015
Multigrid Smoothers for Ultraparallel Computing journal January 2011
A high-performance, portable implementation of the MPI message passing interface standard journal September 1996
An evaluation of MPI message rate on hybrid-core processors journal November 2014
Synchronization without contention journal April 1991
Enabling communication concurrency through flexible MPI endpoints journal September 2014
FG-MPI: Fine-grain MPI for multicore and clusters
  • Kamal, Humaira; Wagner, Alan
  • Distributed Processing, Workshops and Phd Forum (IPDPSW 2010), 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) https://doi.org/10.1109/IPDPSW.2010.5470773
conference April 2010
Synchronization without contention journal April 1991
CHARM++: a portable concurrent object oriented system based on C++
  • Kale, Laxmikant V.; Krishnan, Sanjeev
  • Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications - OOPSLA '93 https://doi.org/10.1145/165854.165874
conference January 1993
Measuring Multithreaded Message Matching Misery book January 2018
A fast and resource-conscious MPI message queue mechanism for large-scale jobs journal January 2014
Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors conference May 2017

Cited By (1)

PAMPAR: A new parallel benchmark for performance and energy consumption evaluation
  • Marques Garcia, Adriano; Schepke, Claudio; Girardi, Alessandro
  • Concurrency and Computation: Practice and Experience, Vol. 32, Issue 20 https://doi.org/10.1002/cpe.5504
journal October 2019