Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Machine-learning-aided cognitive reconfiguration for flexible-bandwidth HPC and data center networks [Invited]

Journal Article · · Journal of Optical Communications and Networking
DOI:https://doi.org/10.1364/JOCN.412360· OSTI ID:1842377

This paper proposes a machine-learning (ML)-aided cognitive approach for effective bandwidth reconfiguration in optically interconnected datacenter/high-performance computing (HPC) systems. The proposed approach relies on a Hyper-X-like architecture augmented with flexible-bandwidth photonic interconnections at large scales using a hierarchical intra/inter-POD photonic switching layout. We first formulate the problem of the connectivity graph and routing scheme optimization as a mixed-integer linear programming model. A two-phase heuristic algorithm and a joint optimization approach are devised to solve the problem with low time complexity. Then, we propose an ML-based end-to-end performance estimator design to assist the network control plane with intelligent decision making for bandwidth reconfiguration. Numerical simulations using traffic distribution profiles extracted from HPC applications traces as well as random traffic matrices verify the accuracy performance of the ML design estimator ( < <#comment/> 9 % <#comment/> error) and demonstrate up to 5 × <#comment/> throughput gain from the proposed approach compared with the baseline Hyper-X network using fixed all-to-all intra/inter-portable data center interconnects.

Sponsoring Organization:
USDOE
Grant/Contract Number:
SC0019526; SC0019582
OSTI ID:
1842377
Alternate ID(s):
OSTI ID: 1853355
Journal Information:
Journal of Optical Communications and Networking, Journal Name: Journal of Optical Communications and Networking Journal Issue: 6 Vol. 13; ISSN JOCNBB; ISSN 1943-0620
Publisher:
Optical Society of AmericaCopyright Statement
Country of Publication:
United States
Language:
English

References (22)

Dynamic Service Provisioning in Elastic Optical Networks With Hybrid Single-/Multi-Path Routing journal January 2013
Experimental Demonstration of Flexible Bandwidth Optical Data Center Core Network With All-to-All Interconnectivity journal April 2015
A Scalable, Low-Latency, High-Throughput, Optical Interconnect Architecture Based on Arrayed Waveguide Grating Routers journal February 2015
Predictive Analytics Based Knowledge-Defined Orchestration in a Hybrid Optical/Electrical Datacenter Network Testbed journal October 2019
Silicon Photonic Flex-LIONS for Bandwidth-Reconfigurable Optical Interconnects journal March 2020
Highly efficient data migration and backup for big data applications in elastic optical inter-data-center networks journal September 2015
Flexfly: Enabling a Reconfigurable Dragonfly through Silicon Photonics
  • Wen, Ke; Samadi, Payman; Rumley, Sebastien
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.14
conference November 2016
A Survey on Optical Interconnects for Data Centers journal January 2012
OSA: An Optical Switching Architecture for Data Center Networks With Unprecedented Flexibility journal April 2014
Latest standardization status and its future directions for high speed optical transceivers conference March 2019
A scalable, commodity data center network architecture journal October 2008
Understanding data center traffic characteristics journal January 2010
c-Through: part-time optics in data centers journal August 2010
Helios: a hybrid electrical/optical switch architecture for modular data centers journal August 2010
Xpander: Towards Optimal-Performance Datacenters
  • Valadarsky, Asaf; Shahaf, Gal; Dinitz, Michael
  • Proceedings of the 12th International on Conference on emerging Networking EXperiments and Technologies - CoNEXT '16 https://doi.org/10.1145/2999572.2999580
conference January 2016
Neural Network Meets DCN: Traffic-driven Topology Adaptation with Deep Learning
  • Wang, Mowei; Cui, Yong; Xiao, Shihan
  • Proceedings of the ACM on Measurement and Analysis of Computing Systems, Vol. 2, Issue 2 https://doi.org/10.1145/3224421
journal June 2018
HyperX topology: first at-scale implementation and comparison to the fat-tree
  • Domke, Jens; Matsuoka, Satoshi; Ivanov, Ivan R.
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356140
conference November 2019
MEMS-Actuated 8×8 Silicon Photonic Wavelength-Selective Switches with 8 Wavelength Channels conference January 2018
HiFOST: A Scalable and Low-Latency Hybrid Data Center Network Architecture Based on Flow-Controlled Fast Optical Switches journal January 2018
Flexspander: augmenting expander networks in high-performance systems with optical bandwidth steering journal January 2020
Photonic switching in high performance datacenters [Invited] journal January 2018
Autonomous Dynamic Bandwidth Steering with Silicon Photonic-Based Wavelength and Spatial Switching for Datacom Networks conference January 2018

Similar Records

Efficient and compact thermo-optic phase shifter in silicon-rich silicon nitride
Journal Article · Mon Sep 13 20:00:00 EDT 2021 · Optics Letters · OSTI ID:1819771

Ultra-high extinction ratio polarization beam splitter with extreme skin-depth waveguide
Journal Article · Tue Apr 27 20:00:00 EDT 2021 · Optics Letters · OSTI ID:1779911

Related Subjects