skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Reduce Operations: Send Volume Balancing While Minimizing Latency

Abstract

Communication hypergraph model was introduced in a two-phase setting for encapsulating multiple communication cost metrics (bandwidth and latency), which are proven to be important in parallelizing irregular applications. In the first phase, computational-task-to-processor assignment is performed with the objective of minimizing total volume while maintaining computational load balance. In the second phase, communication-task-to-processor assignment is performed with the objective of minimizing total number of messages while maintaining communication-volume balance. The reduce-communication hypergraph model suffers from failing to correctly encapsulate send-volume balancing. We propose a novel vertex weighting scheme that enables part weights to correctly encode send-volume loads of processors for send-volume balancing. The model also suffers from increasing the total communication volume during partitioning. To decrease this increase, we propose a method that utilizes the recursive bipartitioning framework and refines each bipartition by vertex swaps. For performance evaluation, we consider column-parallel SpMV, which is one of the most widely known applications in which the reduce-task assignment problem arises. Extensive experiments on 313 matrices show that, compared to the existing model, the proposed models achieve considerable improvements in all communication cost metrics. Furthermore, these improvements lead to an average decrease of 30% in parallel SpMV time on 512 processors for 70more » matrices with high irregularity.« less

Authors:
ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [1]
  1. Bilkent Univ., Ankara (Turkey)
  2. Sandia National Lab. (SNL-CA), Livermore, CA (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
National Center for High Performance Computing of Turkey (UHeM); USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1601274
Alternate Identifier(s):
OSTI ID: 1595427
Report Number(s):
SAND-2020-1349J; SAND-2020-0151J
Journal ID: ISSN 1045-9219; 683607
Grant/Contract Number:  
AC04-94AL85000; 4005072018; NA-0003525
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
IEEE Transactions on Parallel and Distributed Systems
Additional Journal Information:
Journal Volume: 31; Journal Issue: 6; Journal ID: ISSN 1045-9219
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; communication hypergraph; communication cost; maximum communication volume; communication volume; latency; recursive bipartitioning; hypergraph partitioning; sparse matrix; sparse matrix-vector multiplication

Citation Formats

Karsavuran, M. Ozan, Acer, Seher, and Aykanat, Cevdet. Reduce Operations: Send Volume Balancing While Minimizing Latency. United States: N. p., 2020. Web. doi:10.1109/TPDS.2020.2964536.
Karsavuran, M. Ozan, Acer, Seher, & Aykanat, Cevdet. Reduce Operations: Send Volume Balancing While Minimizing Latency. United States. https://doi.org/10.1109/TPDS.2020.2964536
Karsavuran, M. Ozan, Acer, Seher, and Aykanat, Cevdet. 2020. "Reduce Operations: Send Volume Balancing While Minimizing Latency". United States. https://doi.org/10.1109/TPDS.2020.2964536. https://www.osti.gov/servlets/purl/1601274.
@article{osti_1601274,
title = {Reduce Operations: Send Volume Balancing While Minimizing Latency},
author = {Karsavuran, M. Ozan and Acer, Seher and Aykanat, Cevdet},
abstractNote = {Communication hypergraph model was introduced in a two-phase setting for encapsulating multiple communication cost metrics (bandwidth and latency), which are proven to be important in parallelizing irregular applications. In the first phase, computational-task-to-processor assignment is performed with the objective of minimizing total volume while maintaining computational load balance. In the second phase, communication-task-to-processor assignment is performed with the objective of minimizing total number of messages while maintaining communication-volume balance. The reduce-communication hypergraph model suffers from failing to correctly encapsulate send-volume balancing. We propose a novel vertex weighting scheme that enables part weights to correctly encode send-volume loads of processors for send-volume balancing. The model also suffers from increasing the total communication volume during partitioning. To decrease this increase, we propose a method that utilizes the recursive bipartitioning framework and refines each bipartition by vertex swaps. For performance evaluation, we consider column-parallel SpMV, which is one of the most widely known applications in which the reduce-task assignment problem arises. Extensive experiments on 313 matrices show that, compared to the existing model, the proposed models achieve considerable improvements in all communication cost metrics. Furthermore, these improvements lead to an average decrease of 30% in parallel SpMV time on 512 processors for 70 matrices with high irregularity.},
doi = {10.1109/TPDS.2020.2964536},
url = {https://www.osti.gov/biblio/1601274}, journal = {IEEE Transactions on Parallel and Distributed Systems},
issn = {1045-9219},
number = 6,
volume = 31,
place = {United States},
year = {2020},
month = {1}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

Save / Share: