Modeling Universal Globally Adaptive Load-Balanced Routing
Abstract
Universal globally adaptive load-balanced (UGAL) routing has been proposed for various interconnection networks and has been deployed in a number of current-generation supercomputers. Although UGAL-based schemes have been extensively studied, most existing results are based on either simulation or measurement. Without a theoretical understanding of UGAL, multiple questions remain: For which traffic patterns is UGAL most suited? Furthermore, what determines the performance of the UGAL-based scheme on a particular network configuration? Here, we develop a set of throughput models for UGALbased on linear programming. We show that the throughput models are valid across the torus, Dragonfly, and Slim Fly network topologies. Finally, we identify a robust model that can accurately and efficiently predict UGAL throughput for a set of representative traffic patterns across different topologies. Our models not only provide a mechanism to predict UGAL performance on large-scale interconnection networks but also reveal the inner working of UGAL and further our understanding of this type of routing.
- Authors:
-
- Oakland Univ., Rochester, MI (United States)
- Florida State Univ., Tallahassee, FL (United States)
- Univ. of North Florida, Jacksonville, FL (United States)
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Publication Date:
- Research Org.:
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA), Office of Defense Programs (DP)
- OSTI Identifier:
- 1565855
- Report Number(s):
- LA-UR-18-28331
Journal ID: ISSN 2329-4949
- Grant/Contract Number:
- 89233218CNA000001
- Resource Type:
- Accepted Manuscript
- Journal Name:
- ACM Transactions on Parallel Computing
- Additional Journal Information:
- Journal Volume: 6; Journal Issue: 2; Journal ID: ISSN 2329-4949
- Publisher:
- Association for Computing Machinery
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Mollah, Md Atiqul, Wang, Wenqi, Faizian, Peyman, Rahman, MD Shafayat, Yuan, Xin, Pakin, Scott, and Lang, Michael. Modeling Universal Globally Adaptive Load-Balanced Routing. United States: N. p., 2019.
Web. doi:10.1145/3349620.
Mollah, Md Atiqul, Wang, Wenqi, Faizian, Peyman, Rahman, MD Shafayat, Yuan, Xin, Pakin, Scott, & Lang, Michael. Modeling Universal Globally Adaptive Load-Balanced Routing. United States. https://doi.org/10.1145/3349620
Mollah, Md Atiqul, Wang, Wenqi, Faizian, Peyman, Rahman, MD Shafayat, Yuan, Xin, Pakin, Scott, and Lang, Michael. Tue .
"Modeling Universal Globally Adaptive Load-Balanced Routing". United States. https://doi.org/10.1145/3349620. https://www.osti.gov/servlets/purl/1565855.
@article{osti_1565855,
title = {Modeling Universal Globally Adaptive Load-Balanced Routing},
author = {Mollah, Md Atiqul and Wang, Wenqi and Faizian, Peyman and Rahman, MD Shafayat and Yuan, Xin and Pakin, Scott and Lang, Michael},
abstractNote = {Universal globally adaptive load-balanced (UGAL) routing has been proposed for various interconnection networks and has been deployed in a number of current-generation supercomputers. Although UGAL-based schemes have been extensively studied, most existing results are based on either simulation or measurement. Without a theoretical understanding of UGAL, multiple questions remain: For which traffic patterns is UGAL most suited? Furthermore, what determines the performance of the UGAL-based scheme on a particular network configuration? Here, we develop a set of throughput models for UGALbased on linear programming. We show that the throughput models are valid across the torus, Dragonfly, and Slim Fly network topologies. Finally, we identify a robust model that can accurately and efficiently predict UGAL throughput for a set of representative traffic patterns across different topologies. Our models not only provide a mechanism to predict UGAL performance on large-scale interconnection networks but also reveal the inner working of UGAL and further our understanding of this type of routing.},
doi = {10.1145/3349620},
journal = {ACM Transactions on Parallel Computing},
number = 2,
volume = 6,
place = {United States},
year = {Tue Sep 10 00:00:00 EDT 2019},
month = {Tue Sep 10 00:00:00 EDT 2019}
}
Works referenced in this record:
Contention-Based Nonminimal Adaptive Routing in High-Radix Networks
conference, May 2015
- Fuentes, Pablo; Vallejo, Enrique; Garcia, Marina
- 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Random Regular Graph and Generalized De Bruijn Graph with $k$ -Shortest Path Routing
journal, January 2018
- Faizian, Peyman; Mollah, Md Atiqul; Yuan, Xin
- IEEE Transactions on Parallel and Distributed Systems, Vol. 29, Issue 1
OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management
conference, August 2013
- Garcia, Marina; Vallejo, Enrique; Beivide, Ramon
- 2013 IEEE 21st Annual Symposium on High-Performance Interconnects (HOTI)
Overcoming far-end congestion in large-scale networks
conference, February 2015
- Won, Jongmin; Kim, Gwangsun; Kim, John
- 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)
Efficient Routing Mechanisms for Dragonfly Networks
conference, October 2013
- Garcia, Marina; Vallejo, Enrique; Beivide, Ramon
- 2013 42nd International Conference on Parallel Processing (ICPP)
A comparative study of SDN and adaptive routing on dragonfly networks
conference, January 2017
- Faizian, Peyman; Mollah, Md Atiqul; Tong, Zhou
- Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17
Express Cube Topologies for on-Chip Interconnects
conference, February 2009
- Grot, Boris; Hestness, Joel; Keckler, Stephen W.
- 2009 IEEE 15th International Symposium on High Performance Computer Architecture
Oblivious routing schemes in extended generalized Fat Tree networks
conference, August 2009
- Rodriguez, German; Minkenberg, Cyriel; Beivide, Ramon
- 2009 IEEE International Conference on Cluster Computing and Workshops (CLUSTER)
Indirect adaptive routing on large scale interconnection networks
journal, June 2009
- Jiang, Nan; Kim, John; Dally, William J.
- ACM SIGARCH Computer Architecture News, Vol. 37, Issue 3
Universal schemes for parallel communication
conference, January 1981
- Valiant, L. G.; Brebner, G. J.
- Proceedings of the thirteenth annual ACM symposium on Theory of computing - STOC '81
Oblivious Routing in Fat-Tree Based System Area Networks With Uncertain Traffic Demands
journal, October 2009
- Xin Yuan, ; Nienaber, W.
- IEEE/ACM Transactions on Networking, Vol. 17, Issue 5
A Scheme for Fast Parallel Communication
journal, May 1982
- Valiant, L. G.
- SIAM Journal on Computing, Vol. 11, Issue 2
Load-Balanced Slim Fly Networks
conference, January 2018
- Rahman, Md Shafayat; Mollah, Md Atiqul; Faizian, Peyman
- Proceedings of the 47th International Conference on Parallel Processing - ICPP 2018
Slim Fly: A Cost Effective Low-Diameter Network Topology
conference, November 2014
- Besta, Maciej; Hoefler, Torsten
- SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
LFTI: A New Performance Metric for Assessing Interconnect Designs for Extreme-Scale HPC Systems
conference, May 2014
- Yuan, Xin; Mahapatra, Santosh; Lang, Michael
- 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
Maximizing Throughput on a Dragonfly Network
conference, November 2014
- Jain, Nikhil; Bhatele, Abhinav; Ni, Xiang
- SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
On-the-Fly Adaptive Routing in High-Radix Hierarchical Networks
conference, September 2012
- Garcia, Marina; Vallejo, Enrique; Beivide, Ramon
- 2012 41st International Conference on Parallel Processing (ICPP)
Projective Networks: Topologies for Large Parallel Computer Systems
journal, July 2017
- Camarero, Cristobal; Martinez, Carmen; Vallejo, Enrique
- IEEE Transactions on Parallel and Distributed Systems, Vol. 28, Issue 7
The maximum concurrent flow problem
journal, April 1990
- Shahrokhi, Farhad; Matula, D. W.
- Journal of the ACM, Vol. 37, Issue 2
A detailed and flexible cycle-accurate Network-on-Chip simulator
conference, April 2013
- Jiang, Nan; Balfour, James; Becker, Daniel U.
- 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks
journal, January 2018
- Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott
- IEEE Transactions on Parallel and Distributed Systems, Vol. 29, Issue 1
On Folded-Clos Networks with Deterministic Single-Path Routing
journal, January 2016
- Yuan, Xin; Nienaber, Wickus; Mahapatra, Santosh
- ACM Transactions on Parallel Computing, Vol. 2, Issue 4
Technology-Driven, Highly-Scalable Dragonfly Topology
journal, June 2008
- Kim, John; Dally, Wiliam J.; Scott, Steve
- ACM SIGARCH Computer Architecture News, Vol. 36, Issue 3
A new routing scheme for Jellyfish and its performance with HPC workloads
conference, January 2013
- Yuan, Xin; Mahapatra, Santosh; Nienaber, Wickus
- Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
TPR: Traffic Pattern-Based Adaptive Routing for Dragonfly Networks
journal, October 2018
- Faizian, Peyman; Alfaro, Juan Francisco; Rahman, Md Shafayat
- IEEE Transactions on Multi-Scale Computing Systems, Vol. 4, Issue 4