Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Performance trade-offs in reconfigurable networks for HPC

Journal Article · · Journal of Optical Communications and Networking
DOI:https://doi.org/10.1364/JOCN.451760· OSTI ID:1874993

Designing efficient interconnects to support high-bandwidth and low-latency communication is critical toward realizing high performance computing (HPC) and data center (DC) systems in the exascale era. At extreme computing scales, providing the requisite bandwidth through overprovisioning becomes impractical. These challenges have motivated studies exploring reconfigurable network architectures that can adapt to traffic patterns at runtime using optical circuit switching. Despite the plethora of proposed architectures, surprisingly little is known about the relative performances and trade-offs among different reconfigurable network designs. We aim to bridge this gap by tackling two key issues in reconfigurable network design. First, we study how cost, power consumption, network performance, and scalability vary based on optical circuit switch (OCS) placement in the physical topology. Specifically, we consider two classes of reconfigurable architectures: one that places OCSs between top-of-rack (ToR) switches—ToR-reconfigurable networks (TRNs)—and one that places OCSs between pods of racks—pod-reconfigurable networks (PRNs). Second, we tackle the effects of reconfiguration frequency on network performance. Our results, based on network simulations driven by real HPC and DC workloads, show that while TRNs are optimized for low fan-out communication patterns, they are less suited for carrying high fan-out workloads. PRNs exhibit better overall trade-off, capable of performing comparably to a fully non-blocking fat tree for low fan-out workloads, and significantly outperform TRNs for high fan-out communication patterns.

Sponsoring Organization:
USDOE Advanced Research Projects Agency - Energy (ARPA-E)
OSTI ID:
1874993
Alternate ID(s):
OSTI ID: 1867045
Journal Information:
Journal of Optical Communications and Networking, Journal Name: Journal of Optical Communications and Networking Journal Issue: 6 Vol. 14; ISSN JOCNBB; ISSN 1943-0620
Publisher:
Optical Society of AmericaCopyright Statement
Country of Publication:
United States
Language:
English

References (42)

On the impossibility of Directed Moore Graphs journal December 1980
Synchronous subnanosecond clock and data recovery for optically switched data centres using clock phase caching journal June 2020
Baldur: A Power-Efficient and Scalable Network Using All-Optical Switches conference February 2020
Technology-Driven, Highly-Scalable Dragonfly Topology
  • Kim, John; Dally, Wiliam J.; Scott, Steve
  • 2008 35th International Symposium on Computer Architecture (ISCA), 2008 International Symposium on Computer Architecture https://doi.org/10.1109/ISCA.2008.19
conference June 2008
PULSE: Optical Circuit Switched Data Center Architecture Operating at Nanosecond Timescales journal September 2020
A Multiport Microsecond Optical Circuit Switch for Data Center Networking journal August 2013
NEPHELE: An End-to-End Scalable and Dynamically Reconfigurable Optical Architecture for Application-Aware SDN Cloud Data Centers journal February 2018
Cray Cascade: A scalable HPC system based on a Dragonfly network
  • Faanes, Greg; Bataineh, Abdulla; Roweth, Duncan
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.39
conference November 2012
Flexfly: Enabling a Reconfigurable Dragonfly through Silicon Photonics
  • Wen, Ke; Samadi, Payman; Rumley, Sebastien
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.14
conference November 2016
Architecture and Performance Studies of 3D-Hyper-FleX-LION for Reconfigurable All-to-All HPC Networks conference November 2020
FatPaths: Routing in Supercomputers and Data Centers when Shortest Paths Fall Short conference November 2020
Fat-trees: Universal networks for hardware-efficient supercomputing journal October 1985
Latest standardization status and its future directions for high speed optical transceivers conference March 2019
Cope
  • Wang, Hao; Xie, Haiyong; Qiu, Lili
  • Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications https://doi.org/10.1145/1159913.1159926
conference August 2006
A scalable, commodity data center network architecture conference January 2008
VL2: a scalable and flexible data center network conference January 2009
Understanding data center traffic characteristics conference January 2009
HyperX: topology, routing, and packaging of efficient large-scale networks conference January 2009
Network traffic characteristics of data centers in the wild conference January 2010
Mirror mirror on the ceiling: flexible wireless links for data centers
  • Zhou, Xia; Zhang, Zengbin; Zhu, Yibo
  • Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication - SIGCOMM '12 https://doi.org/10.1145/2342356.2342440
conference January 2012
Integrating microsecond circuit switching into the data center conference August 2013
Scheduling techniques for hybrid circuit/packet networks
  • Liu, He; Mukerjee, Matthew K.; Li, Conglong
  • CoNEXT '15: Conference on emerging Networking Experiments and Technologies, Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies https://doi.org/10.1145/2716281.2836126
conference December 2015
Inside the Social Network's (Datacenter) Network
  • Roy, Arjun; Zeng, Hongyi; Bagga, Jasmeet
  • SIGCOMM '15: ACM SIGCOMM 2015 Conference, Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication https://doi.org/10.1145/2785956.2787472
conference August 2015
Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network
  • Singh, Arjun; Ong, Joon; Agarwal, Amit
  • SIGCOMM '15: ACM SIGCOMM 2015 Conference, Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication https://doi.org/10.1145/2785956.2787508
conference August 2015
ProjecToR: Agile Reconfigurable Data Center Interconnect conference August 2016
Xpander: Towards Optimal-Performance Datacenters
  • Valadarsky, Asaf; Shahaf, Gal; Dinitz, Michael
  • Proceedings of the 12th International on Conference on emerging Networking EXperiments and Technologies - CoNEXT '16 https://doi.org/10.1145/2999572.2999580
conference January 2016
A Tale of Two Topologies: Exploring Convertible Data Center Network Architectures with Flat-tree
  • Xia, Yiting; Sun, Xiaoye Steven; Dzinamarira, Simbarashe
  • SIGCOMM '17: ACM SIGCOMM 2017 Conference, Proceedings of the Conference of the ACM Special Interest Group on Data Communication https://doi.org/10.1145/3098822.3098837
conference August 2017
RotorNet: A Scalable, Low-complexity, Optical Datacenter Network
  • Mellette, William M.; McGuinness, Rob; Roy, Arjun
  • SIGCOMM '17: ACM SIGCOMM 2017 Conference, Proceedings of the Conference of the ACM Special Interest Group on Data Communication https://doi.org/10.1145/3098822.3098838
conference August 2017
Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems journal April 1972
Bandwidth steering in HPC using silicon nanophotonics
  • Michelogiannakis, George; Shen, Yiwen; Teh, Min Yee
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356145
conference November 2019
Topology-custom UGAL routing on dragonfly
  • Rahman, Md Shafayat; Bhowmik, Saptarshi; Ryasnianskiy, Yevgeniy
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356208
conference November 2019
Sirius: A Flat Datacenter Network with Nanosecond Optical Switching
  • Ballani, Hitesh; Costa, Paolo; Behrendt, Raphael
  • SIGCOMM '20: Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication https://doi.org/10.1145/3387514.3406221
conference July 2020
Abstractions for Reconfigurable Hybrid Network Update and A Consistent Update Approach conference August 2021
A Case For Intra-rack Resource Disaggregation in HPC
  • Michelogiannakis, George; Klenk, Benjamin; Cook, Brandon
  • ACM Transactions on Architecture and Code Optimization, Vol. 19, Issue 2 https://doi.org/10.1145/3514245
journal June 2022
Finding the K Shortest Loopless Paths in a Network journal July 1971
Flexspander: augmenting expander networks in high-performance systems with optical bandwidth steering journal January 2020
Spatial Division Multiplexing for High Capacity Optical Interconnects in Modular Data Centers journal January 2017
Novel flat datacenter network architecture based on scalable and flow-controlled optical switch system journal January 2014
Fast, High-radix Silicon Photonic Switches conference January 2018
Large-scale broadband digital silicon photonic switches with vertical adiabatic couplers journal January 2016
Moore Graphs and Beyond: A survey of the Degree/Diameter Problem journal January 2012
A Simulator for Large-Scale Parallel Computer Architectures journal April 2010

Similar Records

Flexspander: augmenting expander networks in high-performance systems with optical bandwidth steering
Journal Article · Thu Feb 27 23:00:00 EST 2020 · Journal of Optical Communications and Networking · OSTI ID:1601693

Distributed deep learning training using silicon photonic switched architectures
Journal Article · Mon Feb 28 23:00:00 EST 2022 · APL Photonics · OSTI ID:1978979

Optics Enabled Networks and Architectures for Data Center Cost and Power Efficiency
Journal Article · Fri Oct 15 00:00:00 EDT 2021 · Journal of Optical Communications and Networking · OSTI ID:1828354

Related Subjects