Performance trade-offs in reconfigurable networks for HPC
Designing efficient interconnects to support high-bandwidth and low-latency communication is critical toward realizing high performance computing (HPC) and data center (DC) systems in the exascale era. At extreme computing scales, providing the requisite bandwidth through overprovisioning becomes impractical. These challenges have motivated studies exploring reconfigurable network architectures that can adapt to traffic patterns at runtime using optical circuit switching. Despite the plethora of proposed architectures, surprisingly little is known about the relative performances and trade-offs among different reconfigurable network designs. We aim to bridge this gap by tackling two key issues in reconfigurable network design. First, we study how cost, power consumption, network performance, and scalability vary based on optical circuit switch (OCS) placement in the physical topology. Specifically, we consider two classes of reconfigurable architectures: one that places OCSs between top-of-rack (ToR) switches—ToR-reconfigurable networks (TRNs)—and one that places OCSs between pods of racks—pod-reconfigurable networks (PRNs). Second, we tackle the effects of reconfiguration frequency on network performance. Our results, based on network simulations driven by real HPC and DC workloads, show that while TRNs are optimized for low fan-out communication patterns, they are less suited for carrying high fan-out workloads. PRNs exhibit better overall trade-off, capable of performing comparably to a fully non-blocking fat tree for low fan-out workloads, and significantly outperform TRNs for high fan-out communication patterns.
- Sponsoring Organization:
- USDOE Advanced Research Projects Agency - Energy (ARPA-E)
- OSTI ID:
- 1874993
- Alternate ID(s):
- OSTI ID: 1867045
- Journal Information:
- Journal of Optical Communications and Networking, Journal Name: Journal of Optical Communications and Networking Journal Issue: 6 Vol. 14; ISSN JOCNBB; ISSN 1943-0620
- Publisher:
- Optical Society of AmericaCopyright Statement
- Country of Publication:
- United States
- Language:
- English
On the impossibility of Directed Moore Graphs
|
journal | December 1980 |
Synchronous subnanosecond clock and data recovery for optically switched data centres using clock phase caching
|
journal | June 2020 |
Baldur: A Power-Efficient and Scalable Network Using All-Optical Switches
|
conference | February 2020 |
Technology-Driven, Highly-Scalable Dragonfly Topology
|
conference | June 2008 |
PULSE: Optical Circuit Switched Data Center Architecture Operating at Nanosecond Timescales
|
journal | September 2020 |
A Multiport Microsecond Optical Circuit Switch for Data Center Networking
|
journal | August 2013 |
NEPHELE: An End-to-End Scalable and Dynamically Reconfigurable Optical Architecture for Application-Aware SDN Cloud Data Centers
|
journal | February 2018 |
Cray Cascade: A scalable HPC system based on a Dragonfly network
|
conference | November 2012 |
Flexfly: Enabling a Reconfigurable Dragonfly through Silicon Photonics
|
conference | November 2016 |
Architecture and Performance Studies of 3D-Hyper-FleX-LION for Reconfigurable All-to-All HPC Networks
|
conference | November 2020 |
FatPaths: Routing in Supercomputers and Data Centers when Shortest Paths Fall Short
|
conference | November 2020 |
Fat-trees: Universal networks for hardware-efficient supercomputing
|
journal | October 1985 |
Latest standardization status and its future directions for high speed optical transceivers
|
conference | March 2019 |
Cope
|
conference | August 2006 |
A scalable, commodity data center network architecture
|
conference | January 2008 |
VL2: a scalable and flexible data center network
|
conference | January 2009 |
Understanding data center traffic characteristics
|
conference | January 2009 |
HyperX: topology, routing, and packaging of efficient large-scale networks
|
conference | January 2009 |
Network traffic characteristics of data centers in the wild
|
conference | January 2010 |
Mirror mirror on the ceiling: flexible wireless links for data centers
|
conference | January 2012 |
Integrating microsecond circuit switching into the data center
|
conference | August 2013 |
Scheduling techniques for hybrid circuit/packet networks
|
conference | December 2015 |
Inside the Social Network's (Datacenter) Network
|
conference | August 2015 |
Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network
|
conference | August 2015 |
ProjecToR: Agile Reconfigurable Data Center Interconnect
|
conference | August 2016 |
Xpander: Towards Optimal-Performance Datacenters
|
conference | January 2016 |
A Tale of Two Topologies: Exploring Convertible Data Center Network Architectures with Flat-tree
|
conference | August 2017 |
RotorNet: A Scalable, Low-complexity, Optical Datacenter Network
|
conference | August 2017 |
Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems
|
journal | April 1972 |
Bandwidth steering in HPC using silicon nanophotonics
|
conference | November 2019 |
Topology-custom UGAL routing on dragonfly
|
conference | November 2019 |
Sirius: A Flat Datacenter Network with Nanosecond Optical Switching
|
conference | July 2020 |
Abstractions for Reconfigurable Hybrid Network Update and A Consistent Update Approach
|
conference | August 2021 |
A Case For Intra-rack Resource Disaggregation in HPC
|
journal | June 2022 |
Finding the K Shortest Loopless Paths in a Network
|
journal | July 1971 |
Flexspander: augmenting expander networks in high-performance systems with optical bandwidth steering
|
journal | January 2020 |
Spatial Division Multiplexing for High Capacity Optical Interconnects in Modular Data Centers
|
journal | January 2017 |
Novel flat datacenter network architecture based on scalable and flow-controlled optical switch system
|
journal | January 2014 |
Fast, High-radix Silicon Photonic Switches
|
conference | January 2018 |
Large-scale broadband digital silicon photonic switches with vertical adiabatic couplers
|
journal | January 2016 |
Moore Graphs and Beyond: A survey of the Degree/Diameter Problem
|
journal | January 2012 |
A Simulator for Large-Scale Parallel Computer Architectures
|
journal | April 2010 |
Similar Records
Distributed deep learning training using silicon photonic switched architectures
Optics Enabled Networks and Architectures for Data Center Cost and Power Efficiency