Tail queues: A multi-threaded matching architecture
Abstract
As we approach exascale, computational parallelism will have to drastically increase in order to meet throughput targets. Many–core architectures have exacerbated this problem by trading reduced clock speeds, core complexity, and computation throughput for increasing parallelism. This presents two major challenges for communication libraries such as MPI: the library must leverage the performance advantages of thread level parallelism and avoid the scalability problems associated with increasing the number of processes to that scale. Hybrid programming models, such as MPI+X, have been proposed to address these challenges. MPI THREAD MULTIPLE is MPI's thread safe mode. While there has been work to optimize it, it largely remains non–performant in most implementations. While current applications avoid MPI multithreading due to performance concerns, it is expected to be utilized in future applications. One of the major synchronous data structures required by MPI is the matching engine. In this paper, we present a parallel matching algorithm that can improve MPI matching for multithreaded applications. We then perform a feasibility study to demonstrate the performance benefit of the technique.
- Authors:
-
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Univ. of New Mexico, Albuquerque, NM (United States)
- Univ. of New Mexico, Albuquerque, NM (United States)
- Publication Date:
- Research Org.:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA)
- OSTI Identifier:
- 1496973
- Report Number(s):
- SAND-2019-1466J
Journal ID: ISSN 1532-0626; 672473
- Grant/Contract Number:
- AC04-94AL85000
- Resource Type:
- Journal Article: Accepted Manuscript
- Journal Name:
- Concurrency and Computation. Practice and Experience
- Additional Journal Information:
- Journal Volume: 32; Journal Issue: 3; Journal ID: ISSN 1532-0626
- Publisher:
- Wiley
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; high performance computing; many core; MPI; networks
Citation Formats
Dosanjh, Matthew G. F., Grant, Ryan E., Schonbein, Whit, and Bridges, Patrick G. Tail queues: A multi-threaded matching architecture. United States: N. p., 2019.
Web. doi:10.1002/cpe.5158.
Dosanjh, Matthew G. F., Grant, Ryan E., Schonbein, Whit, & Bridges, Patrick G. Tail queues: A multi-threaded matching architecture. United States. https://doi.org/10.1002/cpe.5158
Dosanjh, Matthew G. F., Grant, Ryan E., Schonbein, Whit, and Bridges, Patrick G. 2019.
"Tail queues: A multi-threaded matching architecture". United States. https://doi.org/10.1002/cpe.5158. https://www.osti.gov/servlets/purl/1496973.
@article{osti_1496973,
title = {Tail queues: A multi-threaded matching architecture},
author = {Dosanjh, Matthew G. F. and Grant, Ryan E. and Schonbein, Whit and Bridges, Patrick G.},
abstractNote = {As we approach exascale, computational parallelism will have to drastically increase in order to meet throughput targets. Many–core architectures have exacerbated this problem by trading reduced clock speeds, core complexity, and computation throughput for increasing parallelism. This presents two major challenges for communication libraries such as MPI: the library must leverage the performance advantages of thread level parallelism and avoid the scalability problems associated with increasing the number of processes to that scale. Hybrid programming models, such as MPI+X, have been proposed to address these challenges. MPI THREAD MULTIPLE is MPI's thread safe mode. While there has been work to optimize it, it largely remains non–performant in most implementations. While current applications avoid MPI multithreading due to performance concerns, it is expected to be utilized in future applications. One of the major synchronous data structures required by MPI is the matching engine. In this paper, we present a parallel matching algorithm that can improve MPI matching for multithreaded applications. We then perform a feasibility study to demonstrate the performance benefit of the technique.},
doi = {10.1002/cpe.5158},
url = {https://www.osti.gov/biblio/1496973},
journal = {Concurrency and Computation. Practice and Experience},
issn = {1532-0626},
number = 3,
volume = 32,
place = {United States},
year = {Wed Feb 06 00:00:00 EST 2019},
month = {Wed Feb 06 00:00:00 EST 2019}
}
Web of Science
Works referenced in this record:
Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming
journal, February 2010
- Balaji, Pavan; Buntinas, Darius; Goodell, David
- The International Journal of High Performance Computing Applications, Vol. 24, Issue 1
Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications
conference, November 2014
- Stark, Dylan T.; Barrett, Richard F.; Grant, Ryan E.
- 2014 Workshop on Exascale MPI at Supercomputing Conference (ExaMPI)
An evaluation of MPI message rate on hybrid-core processors
journal, November 2014
- Barrett, Brian W.; Brightwell, Ron; Grant, Ryan
- The International Journal of High Performance Computing Applications, Vol. 28, Issue 4
Adaptive and Dynamic Design for MPI Tag Matching
conference, September 2016
- Bayatpour, M.; Subramoni, H.; Chakraborty, S.
- 2016 IEEE International Conference on Cluster Computing (CLUSTER)
Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors
conference, May 2017
- Klenk, Benjamin; Froening, Holger; Eberle, Hans
- 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
FG-MPI: Fine-grain MPI for multicore and clusters
conference, April 2010
- Kamal, Humaira; Wagner, Alan
- Distributed Processing, Workshops and Phd Forum (IPDPSW 2010), 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
Enabling communication concurrency through flexible MPI endpoints
journal, September 2014
- Dinan, James; Grant, Ryan E.; Balaji, Pavan
- The International Journal of High Performance Computing Applications, Vol. 28, Issue 4
Synchronization without contention
journal, April 1991
- Mellor-Crummey, John M.; Scott, Michael L.
- ACM SIGPLAN Notices, Vol. 26, Issue 4
Characterizing MPI matching via trace-based simulation
conference, January 2017
- Ferreira, Kurt B.; Levy, Scott; Pedretti, Kevin
- Proceedings of the 24th European MPI Users' Group Meeting on - EuroMPI '17
Multigrid Smoothers for Ultraparallel Computing
journal, January 2011
- Baker, Allison H.; Falgout, Robert D.; Kolev, Tzanio V.
- SIAM Journal on Scientific Computing, Vol. 33, Issue 5
A high-performance, portable implementation of the MPI message passing interface standard
journal, September 1996
- Gropp, William; Lusk, Ewing; Doss, Nathan
- Parallel Computing, Vol. 22, Issue 6
CHARM++: a portable concurrent object oriented system based on C++
conference, January 1993
- Kale, Laxmikant V.; Krishnan, Sanjeev
- Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications - OOPSLA '93
MPI+Threads: runtime contention and remedies
journal, January 2015
- Amer, Abdelhalim; Lu, Huiwei; Wei, Yanjie
- ACM SIGPLAN Notices, Vol. 50, Issue 8
Improving concurrency and asynchrony in multithreaded MPI applications using software offloading
conference, January 2015
- Vaidyanathan, Karthikeyan; Kalamkar, Dhiraj D.; Pamnany, Kiran
- Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor
conference, August 2015
- Sodani, Avinash
- 2015 IEEE Hot Chips 27 Symposium (HCS)
Thread-safety in an MPI implementation: Requirements and analysis
journal, September 2007
- Gropp, William; Thakur, Rajeev
- Parallel Computing, Vol. 33, Issue 9
Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints
conference, November 2014
- Sridharan, Srinivas; Dinan, James; Kalamkar, Dhiraj D.
- SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable parallel programming with CUDA introduction
conference, August 2008
- Nickolls, John
- 2008 IEEE Hot Chips 20 Symposium (HCS)
A fast and resource-conscious MPI message queue mechanism for large-scale jobs
journal, January 2014
- Zounmevo, Judicael A.; Afsahi, Ahmad
- Future Generation Computer Systems, Vol. 30
Synchronization without contention
journal, April 1991
- Mellor-Crummey, John M.; Scott, Michael L.
- ACM SIGARCH Computer Architecture News, Vol. 19, Issue 2
MPI+Threads: runtime contention and remedies
conference, January 2015
- Amer, Abdelhalim; Lu, Huiwei; Wei, Yanjie
- PPoPP '15: 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Synchronization without contention
journal, April 1991
- Mellor-Crummey, John M.; Scott, Michael L.
- ACM SIGOPS Operating Systems Review, Vol. 25, Issue Special Issue
Characterizing MPI matching via trace-based simulation
journal, September 2018
- Ferreira, Kurt B.; Levy, Scott; Pedretti, Kevin
- Parallel Computing, Vol. 77
Works referencing / citing this record:
PAMPAR: A new parallel benchmark for performance and energy consumption evaluation
journal, October 2019
- Marques Garcia, Adriano; Schepke, Claudio; Girardi, Alessandro
- Concurrency and Computation: Practice and Experience, Vol. 32, Issue 20