TPR: Traffic Pattern-based Adaptive Routing for Dragonfly Networks
Abstract
The Cray Cascade architecture uses Dragonfly as its interconnect topology and employs a globally adaptive routing scheme called UGAL. UGAL directs traffic based on link loads but may make inappropriate adaptive routing decisions in various situations, which degrades its performance. Here in this work, we propose traffic pattern-based adaptive routing (TPR) for Dragonfly that improves UGAL by incorporating a traffic pattern-based adaptation mechanism. The idea is to explicitly use the link usage statistics that are collected in performance counters to infer the traffic pattern, and to take the inferred traffic pattern plus link loads into consideration when making adaptive routing decisions. Furthermore, our performance evaluation results on a diverse set of traffic conditions indicate that by incorporating the traffic pattern-based adaptation mechanism, TPR is much more effective in making adaptive routing decisions and achieves significant lower latency under low load and higher throughput under high load than its underlying UGAL.
- Authors:
-
- Florida State Univ., Tallahassee, FL (United States). Dept. of Computer Science
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Publication Date:
- Research Org.:
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA), Office of Defense Programs (DP)
- OSTI Identifier:
- 1481984
- Report Number(s):
- LA-UR-18-20582
Journal ID: ISSN 2372-207X
- Grant/Contract Number:
- AC52-06NA25396
- Resource Type:
- Accepted Manuscript
- Journal Name:
- IEEE Transactions on Multi-Scale Computing Systems
- Additional Journal Information:
- Journal Volume: 4; Journal Issue: 4; Journal ID: ISSN 2372-207X
- Publisher:
- IEEE
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Computer Science; Dragonfly Topology; Cray Cascade; Traffic Pattern-based Adaptive Routing
Citation Formats
Faizian, Peyman, Alfaro, Juan Francisco, Rahman, Md Shafayat, Mollah, Md Atiqul, Yuan, Xin, Pakin, Scott, and Lang, Michael. TPR: Traffic Pattern-based Adaptive Routing for Dragonfly Networks. United States: N. p., 2018.
Web. doi:10.1109/TMSCS.2018.2877264.
Faizian, Peyman, Alfaro, Juan Francisco, Rahman, Md Shafayat, Mollah, Md Atiqul, Yuan, Xin, Pakin, Scott, & Lang, Michael. TPR: Traffic Pattern-based Adaptive Routing for Dragonfly Networks. United States. https://doi.org/10.1109/TMSCS.2018.2877264
Faizian, Peyman, Alfaro, Juan Francisco, Rahman, Md Shafayat, Mollah, Md Atiqul, Yuan, Xin, Pakin, Scott, and Lang, Michael. Mon .
"TPR: Traffic Pattern-based Adaptive Routing for Dragonfly Networks". United States. https://doi.org/10.1109/TMSCS.2018.2877264. https://www.osti.gov/servlets/purl/1481984.
@article{osti_1481984,
title = {TPR: Traffic Pattern-based Adaptive Routing for Dragonfly Networks},
author = {Faizian, Peyman and Alfaro, Juan Francisco and Rahman, Md Shafayat and Mollah, Md Atiqul and Yuan, Xin and Pakin, Scott and Lang, Michael},
abstractNote = {The Cray Cascade architecture uses Dragonfly as its interconnect topology and employs a globally adaptive routing scheme called UGAL. UGAL directs traffic based on link loads but may make inappropriate adaptive routing decisions in various situations, which degrades its performance. Here in this work, we propose traffic pattern-based adaptive routing (TPR) for Dragonfly that improves UGAL by incorporating a traffic pattern-based adaptation mechanism. The idea is to explicitly use the link usage statistics that are collected in performance counters to infer the traffic pattern, and to take the inferred traffic pattern plus link loads into consideration when making adaptive routing decisions. Furthermore, our performance evaluation results on a diverse set of traffic conditions indicate that by incorporating the traffic pattern-based adaptation mechanism, TPR is much more effective in making adaptive routing decisions and achieves significant lower latency under low load and higher throughput under high load than its underlying UGAL.},
doi = {10.1109/TMSCS.2018.2877264},
journal = {IEEE Transactions on Multi-Scale Computing Systems},
number = 4,
volume = 4,
place = {United States},
year = {Mon Oct 22 00:00:00 EDT 2018},
month = {Mon Oct 22 00:00:00 EDT 2018}
}
Figures / Tables:
Figures / Tables found in this record: