skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ANALYSIS OF COMMUNICATION PERFORMANCE DEGRADATION OF THE RADIATION TRANSPORT CODE PIDOTS ON HIGH-UTILIZATION, MULTI-USER HPC SYSTEMS

Abstract

The PIDOTS radiation transport code implements a spatially decomposed Integral Transport Matrix Method (ITMM) solver, which is intended to fully utilize the capabilities of modern, massively parallel high-performance computing (HPC) systems. While the code shows promising results, there was also an unexpected loss of parallel efficiency on the test systems as the number of participating processors grew and the parallelization grain size got finer. We seek to identify and characterize the communication-latency based effects that contribute to this slowdown. Through the creation of a high-level, parameterized representation of the workload of the code, the communication methodology was identified as the likely reason. From there, low-level, InfiniBand communication performance data measured on the Falcon HPC at Idaho National Laboratory (INL) was used to produce a discrete-event model of the communication scheme. The data, as well as the model, show that infrequent, large spikes in send/receive latency drive an unexpectedly substantial increase in projected runtime of a code. Furthermore, these effects are more pronounced under the high-utilization, scattered-workload environment of Falcon. Based on results from the model, it can be shown that a high-communication volume, small message-size regime can easily lead to slowdowns of ~5-10x, when compared to ideal results, as themore » processor count increases.« less

Authors:
ORCiD logo [1];  [1];  [2];  [2]
  1. Idaho National Laboratory
  2. North Carolina State University
Publication Date:
Research Org.:
Idaho National Lab. (INL), Idaho Falls, ID (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1478365
Report Number(s):
INL/CON-17-44029-Rev000
DOE Contract Number:  
AC07-05ID14517
Resource Type:
Conference
Resource Relation:
Conference: PHYSOR: Physics of Reactors 2018, Cancun, Mexico, 04/22/2018 - 04/26/2018
Country of Publication:
United States
Language:
English
Subject:
97 - MATHEMATICS AND COMPUTING; Parallel SN Methods; Spatial Domain Decomposition; Parallel Gauss-Seidel; Communcation Cost on HPC

Citation Formats

Schunert, Sebastian, Garvey, Cormac T., Yessayan, Raffi, and Azmy, Yousry Y. ANALYSIS OF COMMUNICATION PERFORMANCE DEGRADATION OF THE RADIATION TRANSPORT CODE PIDOTS ON HIGH-UTILIZATION, MULTI-USER HPC SYSTEMS. United States: N. p., 2018. Web.
Schunert, Sebastian, Garvey, Cormac T., Yessayan, Raffi, & Azmy, Yousry Y. ANALYSIS OF COMMUNICATION PERFORMANCE DEGRADATION OF THE RADIATION TRANSPORT CODE PIDOTS ON HIGH-UTILIZATION, MULTI-USER HPC SYSTEMS. United States.
Schunert, Sebastian, Garvey, Cormac T., Yessayan, Raffi, and Azmy, Yousry Y. Sun . "ANALYSIS OF COMMUNICATION PERFORMANCE DEGRADATION OF THE RADIATION TRANSPORT CODE PIDOTS ON HIGH-UTILIZATION, MULTI-USER HPC SYSTEMS". United States. https://www.osti.gov/servlets/purl/1478365.
@article{osti_1478365,
title = {ANALYSIS OF COMMUNICATION PERFORMANCE DEGRADATION OF THE RADIATION TRANSPORT CODE PIDOTS ON HIGH-UTILIZATION, MULTI-USER HPC SYSTEMS},
author = {Schunert, Sebastian and Garvey, Cormac T. and Yessayan, Raffi and Azmy, Yousry Y.},
abstractNote = {The PIDOTS radiation transport code implements a spatially decomposed Integral Transport Matrix Method (ITMM) solver, which is intended to fully utilize the capabilities of modern, massively parallel high-performance computing (HPC) systems. While the code shows promising results, there was also an unexpected loss of parallel efficiency on the test systems as the number of participating processors grew and the parallelization grain size got finer. We seek to identify and characterize the communication-latency based effects that contribute to this slowdown. Through the creation of a high-level, parameterized representation of the workload of the code, the communication methodology was identified as the likely reason. From there, low-level, InfiniBand communication performance data measured on the Falcon HPC at Idaho National Laboratory (INL) was used to produce a discrete-event model of the communication scheme. The data, as well as the model, show that infrequent, large spikes in send/receive latency drive an unexpectedly substantial increase in projected runtime of a code. Furthermore, these effects are more pronounced under the high-utilization, scattered-workload environment of Falcon. Based on results from the model, it can be shown that a high-communication volume, small message-size regime can easily lead to slowdowns of ~5-10x, when compared to ideal results, as the processor count increases.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {4}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: