Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Distributed deep learning training using silicon photonic switched architectures

Journal Article · · APL Photonics
DOI:https://doi.org/10.1063/5.0070711· OSTI ID:1978979
 [1];  [2];  [2];  [2];  [2];  [2];  [2]
  1. Columbia University, New York, NY (United States); Columbia University, New York, New York 10027, USA
  2. Columbia University, New York, NY (United States)
The scaling trends of deep learning models and distributed training workloads are challenging network capacities in today’s datacenters and high-performance computing (HPC) systems. We propose a system architecture that leverages silicon photonic (SiP) switch-enabled server regrouping using bandwidth steering to tackle the challenges and accelerate distributed deep learning training. In addition, our proposed system architecture utilizes a highly integrated operating system-based SiP switch control scheme to reduce implementation complexity. To demonstrate the feasibility of our proposal, we built an experimental testbed with a SiP switch-enabled reconfigurable fat tree topology and evaluated the network performance of distributed ring all-reduce and parameter server workloads. The experimental results show up to 3.6× improvements over the static non-reconfigurable fat tree. Our large-scale simulation results show that server regrouping can deliver up to 2.3× flow throughput improvement for a 2× tapered fat tree and a further 11% improvement when higher-layer bandwidth steering is employed. The collective results show the potential of integrating SiP switches into datacenters and HPC systems to accelerate distributed deep learning training.
Research Organization:
Columbia University, New York, NY (United States)
Sponsoring Organization:
National Security Agency (NSA); USDOE Advanced Research Projects Agency - Energy (ARPA-E); USDOE Office of Science (SC), Office of SBIR/STTR Programs (SBIR/STTR)
Grant/Contract Number:
AR0000843
OSTI ID:
1978979
Journal Information:
APL Photonics, Journal Name: APL Photonics Journal Issue: 3 Vol. 7; ISSN 2378-0967
Publisher:
American Institute of Physics (AIP)Copyright Statement
Country of Publication:
United States
Language:
English

References (37)

Silicon Photonics Circuit Design: Methods, Tools and Challenges journal March 2018
32 × 32 silicon electro-optic switch with built-in monitors and balanced-status units journal February 2017
APHiD: Hierarchical Task Placement to Enable a Tapered Fat Tree Topology for Lower Power and Cost in HPC Networks conference May 2017
Demonstration of Multi-Casting in a 1 × 9 LCOS Wavelength Selective Switch journal February 2014
Wavelength Locking and Thermally Stabilizing Microring Resonators Using Dithering Signals journal February 2014
Low-Insertion-Loss and Power-Efficient 32 × 32 Silicon Photonics Switch With Extremely High-Δ Silica PLC Connector journal January 2019
Multi-Stage 8 × 8 Silicon Photonic Switch Based on Dual-Microring Switching Elements journal January 2020
Flexfly: Enabling a Reconfigurable Dragonfly through Silicon Photonics
  • Wen, Ke; Samadi, Payman; Rumley, Sebastien
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.14
conference November 2016
A Scalable AWGR-Based Optical Switch journal November 2015
Large-Scale Polarization-Insensitive Silicon Photonic MEMS Switches journal May 2018
Nonduplicate Polarization-Diversity 32 × 32 Silicon Photonics Switch Based on a SiN/Si Double-Layer Platform journal January 2020
Low-Loss, Low-Crosstalk, and Large-Scale Optical Switch Based on Silicon Photonics journal January 2020
Photonic Switched Optically Connected Memory: An Approach to Address Memory Challenges in Deep Learning journal May 2020
X-NEST: A Scalable, Flexible, and High-Performance Network Architecture for Distributed Machine Learning journal July 2021
Optical Crosspoint Matrix Using Broadband Resonant Switches journal July 2014
1100 x 1100 port MEMS-based optical crossconnect with 4-dB maximum loss journal November 2003
32 × 32 silicon photonic MEMS switch with gap-adjustable directional couplers fabricated in commercial CMOS foundry journal March 2021
Reconfigurable hybrid interconnection for static and dynamic scientific applications conference January 2007
Helios: a hybrid electrical/optical switch architecture for modular data centers conference January 2010
FireFly: a reconfigurable wireless data center fabric using free-space optics conference August 2014
ProjecToR: Agile Reconfigurable Data Center Interconnect conference August 2016
Deep Neural Networks for YouTube Recommendations conference September 2016
RotorNet: A Scalable, Low-complexity, Optical Datacenter Network
  • Mellette, William M.; McGuinness, Rob; Roy, Arjun
  • SIGCOMM '17: ACM SIGCOMM 2017 Conference, Proceedings of the Conference of the ACM Special Interest Group on Data Communication https://doi.org/10.1145/3098822.3098838
conference August 2017
Characterizing the algorithmic complexity of reconfigurable data center architectures conference July 2018
SiP-ML conference August 2021
Flexspander: augmenting expander networks in high-performance systems with optical bandwidth steering journal January 2020
Control of integrated micro-resonator wavelength via balanced homodyne locking journal January 2014
Recent advances in silicon-based passive and active optical interconnects journal January 2015
16 × 16 non-blocking silicon optical switch based on electro-optic Mach-Zehnder interferometers journal January 2016
Software-defined control-plane for wavelength selective unicast and multicast of optical data in a silicon photonic platform journal January 2017
Software-defined networking control plane for seamless integration of multiple silicon photonic switches in Datacom networks journal January 2018
Photonic switching in high performance datacenters [Invited] journal January 2018
Demonstration of the feasibility of large-port-count optical switching using a hybrid Mach–Zehnder interferometer–semiconductor optical amplifier switch module in a recirculating loop journal January 2014
Recent advances in optical technologies for data centers: a review journal January 2018
Wafer-scale silicon photonic switches beyond die size limit journal January 2019
Ultralow-crosstalk, strictly non-blocking microring-based optical switch journal January 2019
Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale journal August 2021

Similar Records

Flexible silicon photonic architecture for accelerating distributed deep learning
Journal Article · Mon Jan 08 19:00:00 EST 2024 · Journal of Optical Communications and Networking · OSTI ID:2280467

Performance trade-offs in reconfigurable networks for HPC
Journal Article · Tue May 10 20:00:00 EDT 2022 · Journal of Optical Communications and Networking · OSTI ID:1874993

Optics Enabled Networks and Architectures for Data Center Cost and Power Efficiency
Journal Article · Thu Oct 14 20:00:00 EDT 2021 · Journal of Optical Communications and Networking · OSTI ID:1828354