skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Tail queues: A multi-threaded matching architecture

Abstract

As we approach exascale, computational parallelism will have to drastically increase in order to meet throughput targets. Many–core architectures have exacerbated this problem by trading reduced clock speeds, core complexity, and computation throughput for increasing parallelism. This presents two major challenges for communication libraries such as MPI: the library must leverage the performance advantages of thread level parallelism and avoid the scalability problems associated with increasing the number of processes to that scale. Hybrid programming models, such as MPI+X, have been proposed to address these challenges. MPI THREAD MULTIPLE is MPI's thread safe mode. While there has been work to optimize it, it largely remains non–performant in most implementations. While current applications avoid MPI multithreading due to performance concerns, it is expected to be utilized in future applications. One of the major synchronous data structures required by MPI is the matching engine. In this paper, we present a parallel matching algorithm that can improve MPI matching for multithreaded applications. We then perform a feasibility study to demonstrate the performance benefit of the technique.

Authors:
ORCiD logo [1];  [1];  [1];  [2]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Univ. of New Mexico, Albuquerque, NM (United States)
  2. Univ. of New Mexico, Albuquerque, NM (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1496973
Report Number(s):
SAND-2019-1466J
Journal ID: ISSN 1532-0626; 672473
Grant/Contract Number:  
AC04-94AL85000
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Concurrency and Computation. Practice and Experience
Additional Journal Information:
Journal Volume: 32; Journal Issue: 3; Journal ID: ISSN 1532-0626
Publisher:
Wiley
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; high performance computing; many core; MPI; networks

Citation Formats

Dosanjh, Matthew G. F., Grant, Ryan E., Schonbein, Whit, and Bridges, Patrick G. Tail queues: A multi-threaded matching architecture. United States: N. p., 2019. Web. doi:10.1002/cpe.5158.
Dosanjh, Matthew G. F., Grant, Ryan E., Schonbein, Whit, & Bridges, Patrick G. Tail queues: A multi-threaded matching architecture. United States. https://doi.org/10.1002/cpe.5158
Dosanjh, Matthew G. F., Grant, Ryan E., Schonbein, Whit, and Bridges, Patrick G. 2019. "Tail queues: A multi-threaded matching architecture". United States. https://doi.org/10.1002/cpe.5158. https://www.osti.gov/servlets/purl/1496973.
@article{osti_1496973,
title = {Tail queues: A multi-threaded matching architecture},
author = {Dosanjh, Matthew G. F. and Grant, Ryan E. and Schonbein, Whit and Bridges, Patrick G.},
abstractNote = {As we approach exascale, computational parallelism will have to drastically increase in order to meet throughput targets. Many–core architectures have exacerbated this problem by trading reduced clock speeds, core complexity, and computation throughput for increasing parallelism. This presents two major challenges for communication libraries such as MPI: the library must leverage the performance advantages of thread level parallelism and avoid the scalability problems associated with increasing the number of processes to that scale. Hybrid programming models, such as MPI+X, have been proposed to address these challenges. MPI THREAD MULTIPLE is MPI's thread safe mode. While there has been work to optimize it, it largely remains non–performant in most implementations. While current applications avoid MPI multithreading due to performance concerns, it is expected to be utilized in future applications. One of the major synchronous data structures required by MPI is the matching engine. In this paper, we present a parallel matching algorithm that can improve MPI matching for multithreaded applications. We then perform a feasibility study to demonstrate the performance benefit of the technique.},
doi = {10.1002/cpe.5158},
url = {https://www.osti.gov/biblio/1496973}, journal = {Concurrency and Computation. Practice and Experience},
issn = {1532-0626},
number = 3,
volume = 32,
place = {United States},
year = {Wed Feb 06 00:00:00 EST 2019},
month = {Wed Feb 06 00:00:00 EST 2019}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 4 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming
journal, February 2010


Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications
conference, November 2014


An evaluation of MPI message rate on hybrid-core processors
journal, November 2014


Adaptive and Dynamic Design for MPI Tag Matching
conference, September 2016


Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors
conference, May 2017


FG-MPI: Fine-grain MPI for multicore and clusters
conference, April 2010

  • Kamal, Humaira; Wagner, Alan
  • Distributed Processing, Workshops and Phd Forum (IPDPSW 2010), 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
  • https://doi.org/10.1109/IPDPSW.2010.5470773

Enabling communication concurrency through flexible MPI endpoints
journal, September 2014


Synchronization without contention
journal, April 1991


Characterizing MPI matching via trace-based simulation
conference, January 2017


Multigrid Smoothers for Ultraparallel Computing
journal, January 2011


A high-performance, portable implementation of the MPI message passing interface standard
journal, September 1996


CHARM++: a portable concurrent object oriented system based on C++
conference, January 1993

  • Kale, Laxmikant V.; Krishnan, Sanjeev
  • Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications - OOPSLA '93
  • https://doi.org/10.1145/165854.165874

MPI+Threads: runtime contention and remedies
journal, January 2015


Improving concurrency and asynchrony in multithreaded MPI applications using software offloading
conference, January 2015

  • Vaidyanathan, Karthikeyan; Kalamkar, Dhiraj D.; Pamnany, Kiran
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
  • https://doi.org/10.1145/2807591.2807602

Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor
conference, August 2015


Thread-safety in an MPI implementation: Requirements and analysis
journal, September 2007


Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints
conference, November 2014

  • Sridharan, Srinivas; Dinan, James; Kalamkar, Dhiraj D.
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2014.45

Scalable parallel programming with CUDA introduction
conference, August 2008


A fast and resource-conscious MPI message queue mechanism for large-scale jobs
journal, January 2014


Synchronization without contention
journal, April 1991


MPI+Threads: runtime contention and remedies
conference, January 2015

  • Amer, Abdelhalim; Lu, Huiwei; Wei, Yanjie
  • PPoPP '15: 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
  • https://doi.org/10.1145/2688500.2688522

Synchronization without contention
journal, April 1991


Characterizing MPI matching via trace-based simulation
journal, September 2018


Works referencing / citing this record:

PAMPAR: A new parallel benchmark for performance and energy consumption evaluation
journal, October 2019

  • Marques Garcia, Adriano; Schepke, Claudio; Girardi, Alessandro
  • Concurrency and Computation: Practice and Experience, Vol. 32, Issue 20
  • https://doi.org/10.1002/cpe.5504