skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: TPR: Traffic Pattern-based Adaptive Routing for Dragonfly Networks

Abstract

The Cray Cascade architecture uses Dragonfly as its interconnect topology and employs a globally adaptive routing scheme called UGAL. UGAL directs traffic based on link loads but may make inappropriate adaptive routing decisions in various situations, which degrades its performance. Here in this work, we propose traffic pattern-based adaptive routing (TPR) for Dragonfly that improves UGAL by incorporating a traffic pattern-based adaptation mechanism. The idea is to explicitly use the link usage statistics that are collected in performance counters to infer the traffic pattern, and to take the inferred traffic pattern plus link loads into consideration when making adaptive routing decisions. Furthermore, our performance evaluation results on a diverse set of traffic conditions indicate that by incorporating the traffic pattern-based adaptation mechanism, TPR is much more effective in making adaptive routing decisions and achieves significant lower latency under low load and higher throughput under high load than its underlying UGAL.

Authors:
 [1];  [1];  [1];  [1];  [1];  [2];  [2]
  1. Florida State Univ., Tallahassee, FL (United States). Dept. of Computer Science
  2. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Publication Date:
Research Org.:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA), Office of Defense Programs (DP) (NA-10)
OSTI Identifier:
1481984
Report Number(s):
LA-UR-18-20582
Journal ID: ISSN 2372-207X
Grant/Contract Number:  
AC52-06NA25396
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
IEEE Transactions on Multi-Scale Computing Systems
Additional Journal Information:
Journal Name: IEEE Transactions on Multi-Scale Computing Systems; Journal ID: ISSN 2372-207X
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Computer Science; Dragonfly Topology; Cray Cascade; Traffic Pattern-based Adaptive Routing

Citation Formats

Faizian, Peyman, Alfaro, Juan Francisco, Rahman, Md Shafayat, Mollah, Md Atiqul, Yuan, Xin, Pakin, Scott, and Lang, Michael. TPR: Traffic Pattern-based Adaptive Routing for Dragonfly Networks. United States: N. p., 2018. Web. doi:10.1109/TMSCS.2018.2877264.
Faizian, Peyman, Alfaro, Juan Francisco, Rahman, Md Shafayat, Mollah, Md Atiqul, Yuan, Xin, Pakin, Scott, & Lang, Michael. TPR: Traffic Pattern-based Adaptive Routing for Dragonfly Networks. United States. doi:10.1109/TMSCS.2018.2877264.
Faizian, Peyman, Alfaro, Juan Francisco, Rahman, Md Shafayat, Mollah, Md Atiqul, Yuan, Xin, Pakin, Scott, and Lang, Michael. Mon . "TPR: Traffic Pattern-based Adaptive Routing for Dragonfly Networks". United States. doi:10.1109/TMSCS.2018.2877264.
@article{osti_1481984,
title = {TPR: Traffic Pattern-based Adaptive Routing for Dragonfly Networks},
author = {Faizian, Peyman and Alfaro, Juan Francisco and Rahman, Md Shafayat and Mollah, Md Atiqul and Yuan, Xin and Pakin, Scott and Lang, Michael},
abstractNote = {The Cray Cascade architecture uses Dragonfly as its interconnect topology and employs a globally adaptive routing scheme called UGAL. UGAL directs traffic based on link loads but may make inappropriate adaptive routing decisions in various situations, which degrades its performance. Here in this work, we propose traffic pattern-based adaptive routing (TPR) for Dragonfly that improves UGAL by incorporating a traffic pattern-based adaptation mechanism. The idea is to explicitly use the link usage statistics that are collected in performance counters to infer the traffic pattern, and to take the inferred traffic pattern plus link loads into consideration when making adaptive routing decisions. Furthermore, our performance evaluation results on a diverse set of traffic conditions indicate that by incorporating the traffic pattern-based adaptation mechanism, TPR is much more effective in making adaptive routing decisions and achieves significant lower latency under low load and higher throughput under high load than its underlying UGAL.},
doi = {10.1109/TMSCS.2018.2877264},
journal = {IEEE Transactions on Multi-Scale Computing Systems},
issn = {2372-207X},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {10}
}

Journal Article:
Free Publicly Available Full Text
This content will become publicly available on October 22, 2019
Publisher's Version of Record

Save / Share: