Tail queues: A multi-threaded matching architecture

Dosanjh, Matthew G. F.; Grant, Ryan E.; Schonbein, Whit; Bridges, Patrick G.

doi:10.1002/cpe.5158

Tail queues: A multi-threaded matching architecture

Journal Article · Tue Feb 05 23:00:00 EST 2019 · Concurrency and Computation. Practice and Experience

DOI:https://doi.org/10.1002/cpe.5158· OSTI ID:1496973

^[1]; Grant, Ryan E. ^[1]; Schonbein, Whit ^[1]; Bridges, Patrick G. ^[2]

Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Univ. of New Mexico, Albuquerque, NM (United States)
Univ. of New Mexico, Albuquerque, NM (United States)

As we approach exascale, computational parallelism will have to drastically increase in order to meet throughput targets. Many–core architectures have exacerbated this problem by trading reduced clock speeds, core complexity, and computation throughput for increasing parallelism. This presents two major challenges for communication libraries such as MPI: the library must leverage the performance advantages of thread level parallelism and avoid the scalability problems associated with increasing the number of processes to that scale. Hybrid programming models, such as MPI+X, have been proposed to address these challenges. MPI THREAD MULTIPLE is MPI's thread safe mode. While there has been work to optimize it, it largely remains non–performant in most implementations. While current applications avoid MPI multithreading due to performance concerns, it is expected to be utilized in future applications. One of the major synchronous data structures required by MPI is the matching engine. In this paper, we present a parallel matching algorithm that can improve MPI matching for multithreaded applications. We then perform a feasibility study to demonstrate the performance benefit of the technique.

View Accepted Manuscript (DOE)

Research Organization:: Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA)

Grant/Contract Number:: AC04-94AL85000

OSTI ID:: 1496973

Alternate ID(s):: OSTI ID: 1511804

Report Number(s):: SAND--2019-1466J; 672473

Journal Information:: Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 3 Vol. 32; ISSN 1532-0626

Publisher:: WileyCopyright Statement

Country of Publication:: United States

Language:: English

References (24)

Measuring Multithreaded Message Matching Misery Schonbein, Whit; Dosanjh, Matthew G. F.; Grant, Ryan E. Euro-Par 2018: Parallel Processing https://doi.org/10.1007/978-3-319-96983-1_34	book	January 2018
A high-performance, portable implementation of the MPI message passing interface standard Gropp, William; Lusk, Ewing; Doss, Nathan Parallel Computing, Vol. 22, Issue 6 https://doi.org/10.1016/0167-8191(96)00024-5	journal	September 1996
A fast and resource-conscious MPI message queue mechanism for large-scale jobs Zounmevo, Judicael A.; Afsahi, Ahmad Future Generation Computer Systems, Vol. 30 https://doi.org/10.1016/j.future.2013.07.003	journal	January 2014
Thread-safety in an MPI implementation: Requirements and analysis Gropp, William; Thakur, Rajeev Parallel Computing, Vol. 33, Issue 9 https://doi.org/10.1016/j.parco.2007.07.002	journal	September 2007
Characterizing MPI matching via trace-based simulation Ferreira, Kurt B.; Levy, Scott; Pedretti, Kevin Parallel Computing, Vol. 77 https://doi.org/10.1016/j.parco.2018.05.005	journal	September 2018
Adaptive and Dynamic Design for MPI Tag Matching Bayatpour, M.; Subramoni, H.; Chakraborty, S. 2016 IEEE International Conference on Cluster Computing (CLUSTER) https://doi.org/10.1109/CLUSTER.2016.69	conference	September 2016
Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications Stark, Dylan T.; Barrett, Richard F.; Grant, Ryan E. 2014 Workshop on Exascale MPI at Supercomputing Conference (ExaMPI) https://doi.org/10.1109/ExaMPI.2014.6	conference	November 2014
Scalable parallel programming with CUDA introduction Nickolls, John 2008 IEEE Hot Chips 20 Symposium (HCS) https://doi.org/10.1109/HOTCHIPS.2008.7476518	conference	August 2008
Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor Sodani, Avinash 2015 IEEE Hot Chips 27 Symposium (HCS) https://doi.org/10.1109/HOTCHIPS.2015.7477467	conference	August 2015
Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors Klenk, Benjamin; Froening, Holger; Eberle, Hans 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2017.94	conference	May 2017
FG-MPI: Fine-grain MPI for multicore and clusters Kamal, Humaira; Wagner, Alan Distributed Processing, Workshops and Phd Forum (IPDPSW 2010), 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) https://doi.org/10.1109/IPDPSW.2010.5470773	conference	April 2010
Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints Sridharan, Srinivas; Dinan, James; Kalamkar, Dhiraj D. SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.45	conference	November 2014
Multigrid Smoothers for Ultraparallel Computing Baker, Allison H.; Falgout, Robert D.; Kolev, Tzanio V. SIAM Journal on Scientific Computing, Vol. 33, Issue 5 https://doi.org/10.1137/100798806	journal	January 2011
Synchronization without contention Mellor-Crummey, John M.; Scott, Michael L. ACM SIGPLAN Notices, Vol. 26, Issue 4 https://doi.org/10.1145/106973.106999	journal	April 1991
Synchronization without contention Mellor-Crummey, John M.; Scott, Michael L. ACM SIGOPS Operating Systems Review, Vol. 25, Issue Special Issue https://doi.org/10.1145/106974.106999	journal	April 1991
Synchronization without contention Mellor-Crummey, John M.; Scott, Michael L. ACM SIGARCH Computer Architecture News, Vol. 19, Issue 2 https://doi.org/10.1145/106975.106999	journal	April 1991
CHARM++: a portable concurrent object oriented system based on C++ Kale, Laxmikant V.; Krishnan, Sanjeev Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications - OOPSLA '93 https://doi.org/10.1145/165854.165874	conference	January 1993
MPI+Threads: runtime contention and remedies Amer, Abdelhalim; Lu, Huiwei; Wei, Yanjie PPoPP '15: 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/2688500.2688522	conference	January 2015
Improving concurrency and asynchrony in multithreaded MPI applications using software offloading Vaidyanathan, Karthikeyan; Kalamkar, Dhiraj D.; Pamnany, Kiran Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807602	conference	January 2015
MPI+Threads: runtime contention and remedies Amer, Abdelhalim; Lu, Huiwei; Wei, Yanjie ACM SIGPLAN Notices, Vol. 50, Issue 8 https://doi.org/10.1145/2858788.2688522	journal	January 2015
Characterizing MPI matching via trace-based simulation Ferreira, Kurt B.; Levy, Scott; Pedretti, Kevin Proceedings of the 24th European MPI Users' Group Meeting on - EuroMPI '17 https://doi.org/10.1145/3127024.3127040	conference	January 2017
Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming Balaji, Pavan; Buntinas, Darius; Goodell, David The International Journal of High Performance Computing Applications, Vol. 24, Issue 1 https://doi.org/10.1177/1094342009360206	journal	February 2010
Enabling communication concurrency through flexible MPI endpoints Dinan, James; Grant, Ryan E.; Balaji, Pavan The International Journal of High Performance Computing Applications, Vol. 28, Issue 4 https://doi.org/10.1177/1094342014548772	journal	September 2014
An evaluation of MPI message rate on hybrid-core processors Barrett, Brian W.; Brightwell, Ron; Grant, Ryan The International Journal of High Performance Computing Applications, Vol. 28, Issue 4 https://doi.org/10.1177/1094342014552085	journal	November 2014

Cited By (1)

PAMPAR: A new parallel benchmark for performance and energy consumption evaluation Marques Garcia, Adriano; Schepke, Claudio; Girardi, Alessandro Concurrency and Computation: Practice and Experience, Vol. 32, Issue 20 https://doi.org/10.1002/cpe.5504	journal	October 2019

Similar Records

Approximate Weighted Matching On Emerging Manycore and Multithreaded Architectures

Journal Article · Thu Nov 29 23:00:00 EST 2012 · International Journal of High Performance Computing Applications, 26 (4 ):413-430 · OSTI ID:1057347

Logically Parallel Communication for Fast MPI+Threads Applications

Journal Article · Thu Apr 22 00:00:00 EDT 2021 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1846741

Factors impacting performance of multithreaded triangular solve.

Conference · Tue Jun 01 00:00:00 EDT 2010 · OSTI ID:1020371

Related Subjects

97 MATHEMATICS AND COMPUTING
MPI
high performance computing
many core
networks

Tail queues: A multi-threaded matching architecture

Citation Formats

References (24)

Cited By (1)

Similar Records

Related Subjects