DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Modeling Universal Globally Adaptive Load-Balanced Routing

Abstract

Universal globally adaptive load-balanced (UGAL) routing has been proposed for various interconnection networks and has been deployed in a number of current-generation supercomputers. Although UGAL-based schemes have been extensively studied, most existing results are based on either simulation or measurement. Without a theoretical understanding of UGAL, multiple questions remain: For which traffic patterns is UGAL most suited? Furthermore, what determines the performance of the UGAL-based scheme on a particular network configuration? Here, we develop a set of throughput models for UGALbased on linear programming. We show that the throughput models are valid across the torus, Dragonfly, and Slim Fly network topologies. Finally, we identify a robust model that can accurately and efficiently predict UGAL throughput for a set of representative traffic patterns across different topologies. Our models not only provide a mechanism to predict UGAL performance on large-scale interconnection networks but also reveal the inner working of UGAL and further our understanding of this type of routing.

Authors:
 [1];  [2];  [3];  [2];  [2]; ORCiD logo [4]; ORCiD logo [4]
  1. Oakland Univ., Rochester, MI (United States)
  2. Florida State Univ., Tallahassee, FL (United States)
  3. Univ. of North Florida, Jacksonville, FL (United States)
  4. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Publication Date:
Research Org.:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA), Office of Defense Programs (DP)
OSTI Identifier:
1565855
Report Number(s):
LA-UR-18-28331
Journal ID: ISSN 2329-4949
Grant/Contract Number:  
89233218CNA000001
Resource Type:
Accepted Manuscript
Journal Name:
ACM Transactions on Parallel Computing
Additional Journal Information:
Journal Volume: 6; Journal Issue: 2; Journal ID: ISSN 2329-4949
Publisher:
Association for Computing Machinery
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Mollah, Md Atiqul, Wang, Wenqi, Faizian, Peyman, Rahman, MD Shafayat, Yuan, Xin, Pakin, Scott, and Lang, Michael. Modeling Universal Globally Adaptive Load-Balanced Routing. United States: N. p., 2019. Web. doi:10.1145/3349620.
Mollah, Md Atiqul, Wang, Wenqi, Faizian, Peyman, Rahman, MD Shafayat, Yuan, Xin, Pakin, Scott, & Lang, Michael. Modeling Universal Globally Adaptive Load-Balanced Routing. United States. https://doi.org/10.1145/3349620
Mollah, Md Atiqul, Wang, Wenqi, Faizian, Peyman, Rahman, MD Shafayat, Yuan, Xin, Pakin, Scott, and Lang, Michael. Tue . "Modeling Universal Globally Adaptive Load-Balanced Routing". United States. https://doi.org/10.1145/3349620. https://www.osti.gov/servlets/purl/1565855.
@article{osti_1565855,
title = {Modeling Universal Globally Adaptive Load-Balanced Routing},
author = {Mollah, Md Atiqul and Wang, Wenqi and Faizian, Peyman and Rahman, MD Shafayat and Yuan, Xin and Pakin, Scott and Lang, Michael},
abstractNote = {Universal globally adaptive load-balanced (UGAL) routing has been proposed for various interconnection networks and has been deployed in a number of current-generation supercomputers. Although UGAL-based schemes have been extensively studied, most existing results are based on either simulation or measurement. Without a theoretical understanding of UGAL, multiple questions remain: For which traffic patterns is UGAL most suited? Furthermore, what determines the performance of the UGAL-based scheme on a particular network configuration? Here, we develop a set of throughput models for UGALbased on linear programming. We show that the throughput models are valid across the torus, Dragonfly, and Slim Fly network topologies. Finally, we identify a robust model that can accurately and efficiently predict UGAL throughput for a set of representative traffic patterns across different topologies. Our models not only provide a mechanism to predict UGAL performance on large-scale interconnection networks but also reveal the inner working of UGAL and further our understanding of this type of routing.},
doi = {10.1145/3349620},
journal = {ACM Transactions on Parallel Computing},
number = 2,
volume = 6,
place = {United States},
year = {Tue Sep 10 00:00:00 EDT 2019},
month = {Tue Sep 10 00:00:00 EDT 2019}
}

Works referenced in this record:

Contention-Based Nonminimal Adaptive Routing in High-Radix Networks
conference, May 2015

  • Fuentes, Pablo; Vallejo, Enrique; Garcia, Marina
  • 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2015.78

Random Regular Graph and Generalized De Bruijn Graph with $k$ -Shortest Path Routing
journal, January 2018

  • Faizian, Peyman; Mollah, Md Atiqul; Yuan, Xin
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 29, Issue 1
  • DOI: 10.1109/TPDS.2017.2741492

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management
conference, August 2013

  • Garcia, Marina; Vallejo, Enrique; Beivide, Ramon
  • 2013 IEEE 21st Annual Symposium on High-Performance Interconnects (HOTI)
  • DOI: 10.1109/HOTI.2013.16

Overcoming far-end congestion in large-scale networks
conference, February 2015

  • Won, Jongmin; Kim, Gwangsun; Kim, John
  • 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)
  • DOI: 10.1109/HPCA.2015.7056051

Efficient Routing Mechanisms for Dragonfly Networks
conference, October 2013

  • Garcia, Marina; Vallejo, Enrique; Beivide, Ramon
  • 2013 42nd International Conference on Parallel Processing (ICPP)
  • DOI: 10.1109/ICPP.2013.72

A comparative study of SDN and adaptive routing on dragonfly networks
conference, January 2017

  • Faizian, Peyman; Mollah, Md Atiqul; Tong, Zhou
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17
  • DOI: 10.1145/3126908.3126959

Express Cube Topologies for on-Chip Interconnects
conference, February 2009

  • Grot, Boris; Hestness, Joel; Keckler, Stephen W.
  • 2009 IEEE 15th International Symposium on High Performance Computer Architecture
  • DOI: 10.1109/HPCA.2009.4798251

Oblivious routing schemes in extended generalized Fat Tree networks
conference, August 2009

  • Rodriguez, German; Minkenberg, Cyriel; Beivide, Ramon
  • 2009 IEEE International Conference on Cluster Computing and Workshops (CLUSTER)
  • DOI: 10.1109/CLUSTR.2009.5289145

Indirect adaptive routing on large scale interconnection networks
journal, June 2009

  • Jiang, Nan; Kim, John; Dally, William J.
  • ACM SIGARCH Computer Architecture News, Vol. 37, Issue 3
  • DOI: 10.1145/1555815.1555783

Universal schemes for parallel communication
conference, January 1981

  • Valiant, L. G.; Brebner, G. J.
  • Proceedings of the thirteenth annual ACM symposium on Theory of computing - STOC '81
  • DOI: 10.1145/800076.802479

Oblivious Routing in Fat-Tree Based System Area Networks With Uncertain Traffic Demands
journal, October 2009


A Scheme for Fast Parallel Communication
journal, May 1982

  • Valiant, L. G.
  • SIAM Journal on Computing, Vol. 11, Issue 2
  • DOI: 10.1137/0211027

Load-Balanced Slim Fly Networks
conference, January 2018

  • Rahman, Md Shafayat; Mollah, Md Atiqul; Faizian, Peyman
  • Proceedings of the 47th International Conference on Parallel Processing - ICPP 2018
  • DOI: 10.1145/3225058.3225081

Slim Fly: A Cost Effective Low-Diameter Network Topology
conference, November 2014

  • Besta, Maciej; Hoefler, Torsten
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2014.34

LFTI: A New Performance Metric for Assessing Interconnect Designs for Extreme-Scale HPC Systems
conference, May 2014

  • Yuan, Xin; Mahapatra, Santosh; Lang, Michael
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2014.38

Maximizing Throughput on a Dragonfly Network
conference, November 2014

  • Jain, Nikhil; Bhatele, Abhinav; Ni, Xiang
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2014.33

On-the-Fly Adaptive Routing in High-Radix Hierarchical Networks
conference, September 2012

  • Garcia, Marina; Vallejo, Enrique; Beivide, Ramon
  • 2012 41st International Conference on Parallel Processing (ICPP)
  • DOI: 10.1109/ICPP.2012.46

Projective Networks: Topologies for Large Parallel Computer Systems
journal, July 2017

  • Camarero, Cristobal; Martinez, Carmen; Vallejo, Enrique
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 28, Issue 7
  • DOI: 10.1109/TPDS.2016.2635640

The maximum concurrent flow problem
journal, April 1990

  • Shahrokhi, Farhad; Matula, D. W.
  • Journal of the ACM, Vol. 37, Issue 2
  • DOI: 10.1145/77600.77620

A detailed and flexible cycle-accurate Network-on-Chip simulator
conference, April 2013

  • Jiang, Nan; Balfour, James; Becker, Daniel U.
  • 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
  • DOI: 10.1109/ISPASS.2013.6557149

Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks
journal, January 2018

  • Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 29, Issue 1
  • DOI: 10.1109/TPDS.2017.2746078

On Folded-Clos Networks with Deterministic Single-Path Routing
journal, January 2016

  • Yuan, Xin; Nienaber, Wickus; Mahapatra, Santosh
  • ACM Transactions on Parallel Computing, Vol. 2, Issue 4
  • DOI: 10.1145/2858654

Technology-Driven, Highly-Scalable Dragonfly Topology
journal, June 2008

  • Kim, John; Dally, Wiliam J.; Scott, Steve
  • ACM SIGARCH Computer Architecture News, Vol. 36, Issue 3
  • DOI: 10.1145/1394608.1382129

A new routing scheme for Jellyfish and its performance with HPC workloads
conference, January 2013

  • Yuan, Xin; Mahapatra, Santosh; Nienaber, Wickus
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
  • DOI: 10.1145/2503210.2503229

TPR: Traffic Pattern-Based Adaptive Routing for Dragonfly Networks
journal, October 2018

  • Faizian, Peyman; Alfaro, Juan Francisco; Rahman, Md Shafayat
  • IEEE Transactions on Multi-Scale Computing Systems, Vol. 4, Issue 4
  • DOI: 10.1109/TMSCS.2018.2877264