skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Tail queues: A multi-threaded matching architecture

Abstract

As we approach exascale, computational parallelism will have to drastically increase in order to meet throughput targets. Many–core architectures have exacerbated this problem by trading reduced clock speeds, core complexity, and computation throughput for increasing parallelism. This presents two major challenges for communication libraries such as MPI: the library must leverage the performance advantages of thread level parallelism and avoid the scalability problems associated with increasing the number of processes to that scale. Hybrid programming models, such as MPI+X, have been proposed to address these challenges. MPI THREAD MULTIPLE is MPI's thread safe mode. While there has been work to optimize it, it largely remains non–performant in most implementations. While current applications avoid MPI multithreading due to performance concerns, it is expected to be utilized in future applications. One of the major synchronous data structures required by MPI is the matching engine. In this paper, we present a parallel matching algorithm that can improve MPI matching for multithreaded applications. We then perform a feasibility study to demonstrate the performance benefit of the technique.

Authors:
ORCiD logo [1];  [1];  [1];  [2]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Univ. of New Mexico, Albuquerque, NM (United States)
  2. Univ. of New Mexico, Albuquerque, NM (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1496973
Report Number(s):
SAND-2019-1466J
Journal ID: ISSN 1532-0626; 672473
Grant/Contract Number:  
AC04-94AL85000
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Concurrency and Computation. Practice and Experience
Additional Journal Information:
Journal Volume: 32; Journal Issue: 3; Journal ID: ISSN 1532-0626
Publisher:
Wiley
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; high performance computing; many core; MPI; networks

Citation Formats

Dosanjh, Matthew G. F., Grant, Ryan E., Schonbein, Whit, and Bridges, Patrick G. Tail queues: A multi-threaded matching architecture. United States: N. p., 2019. Web. doi:10.1002/cpe.5158.
Dosanjh, Matthew G. F., Grant, Ryan E., Schonbein, Whit, & Bridges, Patrick G. Tail queues: A multi-threaded matching architecture. United States. doi:10.1002/cpe.5158.
Dosanjh, Matthew G. F., Grant, Ryan E., Schonbein, Whit, and Bridges, Patrick G. Wed . "Tail queues: A multi-threaded matching architecture". United States. doi:10.1002/cpe.5158. https://www.osti.gov/servlets/purl/1496973.
@article{osti_1496973,
title = {Tail queues: A multi-threaded matching architecture},
author = {Dosanjh, Matthew G. F. and Grant, Ryan E. and Schonbein, Whit and Bridges, Patrick G.},
abstractNote = {As we approach exascale, computational parallelism will have to drastically increase in order to meet throughput targets. Many–core architectures have exacerbated this problem by trading reduced clock speeds, core complexity, and computation throughput for increasing parallelism. This presents two major challenges for communication libraries such as MPI: the library must leverage the performance advantages of thread level parallelism and avoid the scalability problems associated with increasing the number of processes to that scale. Hybrid programming models, such as MPI+X, have been proposed to address these challenges. MPI THREAD MULTIPLE is MPI's thread safe mode. While there has been work to optimize it, it largely remains non–performant in most implementations. While current applications avoid MPI multithreading due to performance concerns, it is expected to be utilized in future applications. One of the major synchronous data structures required by MPI is the matching engine. In this paper, we present a parallel matching algorithm that can improve MPI matching for multithreaded applications. We then perform a feasibility study to demonstrate the performance benefit of the technique.},
doi = {10.1002/cpe.5158},
journal = {Concurrency and Computation. Practice and Experience},
issn = {1532-0626},
number = 3,
volume = 32,
place = {United States},
year = {2019},
month = {2}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

MPI+Threads: runtime contention and remedies
journal, January 2015


Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming
journal, February 2010

  • Balaji, Pavan; Buntinas, Darius; Goodell, David
  • The International Journal of High Performance Computing Applications, Vol. 24, Issue 1
  • DOI: 10.1177/1094342009360206

Improving concurrency and asynchrony in multithreaded MPI applications using software offloading
conference, January 2015

  • Vaidyanathan, Karthikeyan; Kalamkar, Dhiraj D.; Pamnany, Kiran
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
  • DOI: 10.1145/2807591.2807602

Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications
conference, November 2014

  • Stark, Dylan T.; Barrett, Richard F.; Grant, Ryan E.
  • 2014 Workshop on Exascale MPI at Supercomputing Conference (ExaMPI)
  • DOI: 10.1109/ExaMPI.2014.6

An evaluation of MPI message rate on hybrid-core processors
journal, November 2014

  • Barrett, Brian W.; Brightwell, Ron; Grant, Ryan
  • The International Journal of High Performance Computing Applications, Vol. 28, Issue 4
  • DOI: 10.1177/1094342014552085

Adaptive and Dynamic Design for MPI Tag Matching
conference, September 2016

  • Bayatpour, M.; Subramoni, H.; Chakraborty, S.
  • 2016 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2016.69

Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors
conference, May 2017

  • Klenk, Benjamin; Froening, Holger; Eberle, Hans
  • 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2017.94

FG-MPI: Fine-grain MPI for multicore and clusters
conference, April 2010

  • Kamal, Humaira; Wagner, Alan
  • Distributed Processing, Workshops and Phd Forum (IPDPSW 2010), 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
  • DOI: 10.1109/IPDPSW.2010.5470773

Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor
conference, August 2015


Enabling communication concurrency through flexible MPI endpoints
journal, September 2014

  • Dinan, James; Grant, Ryan E.; Balaji, Pavan
  • The International Journal of High Performance Computing Applications, Vol. 28, Issue 4
  • DOI: 10.1177/1094342014548772

Synchronization without contention
journal, April 1991

  • Mellor-Crummey, John M.; Scott, Michael L.
  • ACM SIGPLAN Notices, Vol. 26, Issue 4
  • DOI: 10.1145/106973.106999

Characterizing MPI matching via trace-based simulation
conference, January 2017

  • Ferreira, Kurt B.; Levy, Scott; Pedretti, Kevin
  • Proceedings of the 24th European MPI Users' Group Meeting on - EuroMPI '17
  • DOI: 10.1145/3127024.3127040

Thread-safety in an MPI implementation: Requirements and analysis
journal, September 2007


Multigrid Smoothers for Ultraparallel Computing
journal, January 2011

  • Baker, Allison H.; Falgout, Robert D.; Kolev, Tzanio V.
  • SIAM Journal on Scientific Computing, Vol. 33, Issue 5
  • DOI: 10.1137/100798806

Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints
conference, November 2014

  • Sridharan, Srinivas; Dinan, James; Kalamkar, Dhiraj D.
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2014.45

Scalable parallel programming with CUDA introduction
conference, August 2008


A high-performance, portable implementation of the MPI message passing interface standard
journal, September 1996


A fast and resource-conscious MPI message queue mechanism for large-scale jobs
journal, January 2014


CHARM++: a portable concurrent object oriented system based on C++
conference, January 1993

  • Kale, Laxmikant V.; Krishnan, Sanjeev
  • Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications - OOPSLA '93
  • DOI: 10.1145/165854.165874