DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: TPR: Traffic Pattern-based Adaptive Routing for Dragonfly Networks

Abstract

The Cray Cascade architecture uses Dragonfly as its interconnect topology and employs a globally adaptive routing scheme called UGAL. UGAL directs traffic based on link loads but may make inappropriate adaptive routing decisions in various situations, which degrades its performance. Here in this work, we propose traffic pattern-based adaptive routing (TPR) for Dragonfly that improves UGAL by incorporating a traffic pattern-based adaptation mechanism. The idea is to explicitly use the link usage statistics that are collected in performance counters to infer the traffic pattern, and to take the inferred traffic pattern plus link loads into consideration when making adaptive routing decisions. Furthermore, our performance evaluation results on a diverse set of traffic conditions indicate that by incorporating the traffic pattern-based adaptation mechanism, TPR is much more effective in making adaptive routing decisions and achieves significant lower latency under low load and higher throughput under high load than its underlying UGAL.

Authors:
 [1];  [1];  [1];  [1];  [1];  [2];  [2]
  1. Florida State Univ., Tallahassee, FL (United States). Dept. of Computer Science
  2. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Publication Date:
Research Org.:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA), Office of Defense Programs (DP)
OSTI Identifier:
1481984
Report Number(s):
LA-UR-18-20582
Journal ID: ISSN 2372-207X
Grant/Contract Number:  
AC52-06NA25396
Resource Type:
Accepted Manuscript
Journal Name:
IEEE Transactions on Multi-Scale Computing Systems
Additional Journal Information:
Journal Volume: 4; Journal Issue: 4; Journal ID: ISSN 2372-207X
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Computer Science; Dragonfly Topology; Cray Cascade; Traffic Pattern-based Adaptive Routing

Citation Formats

Faizian, Peyman, Alfaro, Juan Francisco, Rahman, Md Shafayat, Mollah, Md Atiqul, Yuan, Xin, Pakin, Scott, and Lang, Michael. TPR: Traffic Pattern-based Adaptive Routing for Dragonfly Networks. United States: N. p., 2018. Web. doi:10.1109/TMSCS.2018.2877264.
Faizian, Peyman, Alfaro, Juan Francisco, Rahman, Md Shafayat, Mollah, Md Atiqul, Yuan, Xin, Pakin, Scott, & Lang, Michael. TPR: Traffic Pattern-based Adaptive Routing for Dragonfly Networks. United States. https://doi.org/10.1109/TMSCS.2018.2877264
Faizian, Peyman, Alfaro, Juan Francisco, Rahman, Md Shafayat, Mollah, Md Atiqul, Yuan, Xin, Pakin, Scott, and Lang, Michael. Mon . "TPR: Traffic Pattern-based Adaptive Routing for Dragonfly Networks". United States. https://doi.org/10.1109/TMSCS.2018.2877264. https://www.osti.gov/servlets/purl/1481984.
@article{osti_1481984,
title = {TPR: Traffic Pattern-based Adaptive Routing for Dragonfly Networks},
author = {Faizian, Peyman and Alfaro, Juan Francisco and Rahman, Md Shafayat and Mollah, Md Atiqul and Yuan, Xin and Pakin, Scott and Lang, Michael},
abstractNote = {The Cray Cascade architecture uses Dragonfly as its interconnect topology and employs a globally adaptive routing scheme called UGAL. UGAL directs traffic based on link loads but may make inappropriate adaptive routing decisions in various situations, which degrades its performance. Here in this work, we propose traffic pattern-based adaptive routing (TPR) for Dragonfly that improves UGAL by incorporating a traffic pattern-based adaptation mechanism. The idea is to explicitly use the link usage statistics that are collected in performance counters to infer the traffic pattern, and to take the inferred traffic pattern plus link loads into consideration when making adaptive routing decisions. Furthermore, our performance evaluation results on a diverse set of traffic conditions indicate that by incorporating the traffic pattern-based adaptation mechanism, TPR is much more effective in making adaptive routing decisions and achieves significant lower latency under low load and higher throughput under high load than its underlying UGAL.},
doi = {10.1109/TMSCS.2018.2877264},
journal = {IEEE Transactions on Multi-Scale Computing Systems},
number = 4,
volume = 4,
place = {United States},
year = {Mon Oct 22 00:00:00 EDT 2018},
month = {Mon Oct 22 00:00:00 EDT 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Figures / Tables:

Fig. 1 Fig. 1: Cray Cascade intra-group topology

Save / Share: