skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

This content will become publicly available on March 12, 2020

Title: Geometric Mapping of Tasks to Processors on Parallel Computers with Mesh or Torus Networks

Abstract

We present a new method for reducing parallel applications' communication time by mapping their MPI tasks to processors in a way that lowers the distance messages travel and the amount of congestion in the network. Assuming geometric proximity among the tasks is a good approximation of their communication interdependence, we use a geometric partitioning algorithm to order both the tasks and the processors, assigning task parts to the corresponding processor parts. In this way, interdependent tasks are assigned to “nearby” cores in the network. We also present a number of algorithmic optimizations that exploit specific features of the network or application to further improve the quality of the mapping. We specifically address the case of sparse node allocation, where the nodes assigned to a job are not necessarily located in a contiguous block nor within close proximity to each other in the network. Furthermore, our methods generalize to contiguous allocations as well, and results are shown for both contiguous and non-contiguous allocations. We show that, for the structured finite difference mini-application MiniGhost, our mapping methods reduced communication time up to 75% relative to MiniGhost's default mapping on 128K cores of a Cray XK7 with sparse allocation. For the atmospheric modelingmore » code E3SM/HOMME, our methods reduced communication time up to 31% on 16K cores of an IBM BlueGene/Q with contiguous allocation.« less

Authors:
 [1];  [1];  [1];  [1];  [1];  [2]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  2. Georgia Inst. of Technology, Atlanta, GA (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1502119
Report Number(s):
SAND-2019-2710J
Journal ID: ISSN 1045-9219; 673348
Grant/Contract Number:  
AC04-94AL85000
Resource Type:
Accepted Manuscript
Journal Name:
IEEE Transactions on Parallel and Distributed Systems
Additional Journal Information:
Journal Name: IEEE Transactions on Parallel and Distributed Systems; Journal ID: ISSN 1045-9219
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; task mapping; geometric partitioning; spatial partitioning; recursive bisection; jagged partitioning; load balancing

Citation Formats

Deveci, Mehmet, Devine, Karen D., Pedretti, Kevin, Taylor, Mark, Rajamanickam, Sivasankaran, and Catalyurek, Umit V. Geometric Mapping of Tasks to Processors on Parallel Computers with Mesh or Torus Networks. United States: N. p., 2019. Web. doi:10.1109/TPDS.2019.2900043.
Deveci, Mehmet, Devine, Karen D., Pedretti, Kevin, Taylor, Mark, Rajamanickam, Sivasankaran, & Catalyurek, Umit V. Geometric Mapping of Tasks to Processors on Parallel Computers with Mesh or Torus Networks. United States. doi:10.1109/TPDS.2019.2900043.
Deveci, Mehmet, Devine, Karen D., Pedretti, Kevin, Taylor, Mark, Rajamanickam, Sivasankaran, and Catalyurek, Umit V. Tue . "Geometric Mapping of Tasks to Processors on Parallel Computers with Mesh or Torus Networks". United States. doi:10.1109/TPDS.2019.2900043.
@article{osti_1502119,
title = {Geometric Mapping of Tasks to Processors on Parallel Computers with Mesh or Torus Networks},
author = {Deveci, Mehmet and Devine, Karen D. and Pedretti, Kevin and Taylor, Mark and Rajamanickam, Sivasankaran and Catalyurek, Umit V.},
abstractNote = {We present a new method for reducing parallel applications' communication time by mapping their MPI tasks to processors in a way that lowers the distance messages travel and the amount of congestion in the network. Assuming geometric proximity among the tasks is a good approximation of their communication interdependence, we use a geometric partitioning algorithm to order both the tasks and the processors, assigning task parts to the corresponding processor parts. In this way, interdependent tasks are assigned to “nearby” cores in the network. We also present a number of algorithmic optimizations that exploit specific features of the network or application to further improve the quality of the mapping. We specifically address the case of sparse node allocation, where the nodes assigned to a job are not necessarily located in a contiguous block nor within close proximity to each other in the network. Furthermore, our methods generalize to contiguous allocations as well, and results are shown for both contiguous and non-contiguous allocations. We show that, for the structured finite difference mini-application MiniGhost, our mapping methods reduced communication time up to 75% relative to MiniGhost's default mapping on 128K cores of a Cray XK7 with sparse allocation. For the atmospheric modeling code E3SM/HOMME, our methods reduced communication time up to 31% on 16K cores of an IBM BlueGene/Q with contiguous allocation.},
doi = {10.1109/TPDS.2019.2900043},
journal = {IEEE Transactions on Parallel and Distributed Systems},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {3}
}

Journal Article:
Free Publicly Available Full Text
This content will become publicly available on March 12, 2020
Publisher's Version of Record

Save / Share: