Flexible silicon photonic architecture for accelerating distributed deep learning

Wu, Zhenguo; Yuan Dai, Liang; Wang, Yuyang; Wang, Songli; Bergman, Keren

doi:10.1364/JOCN.497372

Flexible silicon photonic architecture for accelerating distributed deep learning

Journal Article · Tue Jan 09 00:00:00 EST 2024 · Journal of Optical Communications and Networking

DOI:https://doi.org/10.1364/JOCN.497372· OSTI ID:2280467

; Yuan Dai, Liang; Wang, Yuyang; Wang, Songli;

The increasing size and complexity of deep learning (DL) models have led to the wide adoption of distributed training methods in datacenters (DCs) and high-performance computing (HPC) systems. However, communication among distributed computing units (CUs) has emerged as a major bottleneck in the training process. In this study, we propose Flex-SiPAC, a flexible silicon photonic accelerated compute cluster designed to accelerate multi-tenant distributed DL training workloads. Flex-SiPAC takes a co-design approach that combines a silicon photonic hardware platform with a tailored collective algorithm, optimized to leverage the unique physical properties of the architecture. The hardware platform integrates a novel wavelength-reconfigurable transceiver design and a micro-resonator-based wavelength-reconfigurable switch, enabling the system to achieve flexible bandwidth steering in the wavelength domain. The collective algorithm is designed to support reconfigurable topologies, enabling efficient all-reduce communications that are commonly used in DL training. The feasibility of the Flex-SiPAC architecture is demonstrated through two testbed experiments. First, an optical testbed experiment demonstrates the flexible routing of wavelengths by shuffling an array of input wavelengths using a custom-designed spatial-wavelength selective switch. Second, a four-GPU testbed running two DL workloads shows a 23% improvement in job completion time compared to a similarly sized leaf-spine topology. We further evaluate Flex-SiPAC using large-scale simulations, which show that Flex-SiPAC is able to reduce the communication time by 26% to 29% compared to state-of-the-art compute clusters under representative collective operations.

View Accepted Manuscript (Publisher)

Sponsoring Organization:: USDOE Advanced Research Projects Agency - Energy (ARPA-E)

OSTI ID:: 2280467

Journal Information:: Journal of Optical Communications and Networking, Journal Name: Journal of Optical Communications and Networking Journal Issue: 2 Vol. 16; ISSN JOCNBB; ISSN 1943-0620

Publisher:: Optical Society of AmericaCopyright Statement

Country of Publication:: United States

Language:: English

References (38)

Ultra-dense optical data transmission over standard fibre with a single chip source Corcoran, Bill; Tan, Mengxi; Xu, Xingyuan Nature Communications, Vol. 11, Issue 1 https://doi.org/10.1038/s41467-020-16265-x	journal	May 2020
A 128 Gb/s PAM4 Silicon Microring Modulator With Integrated Thermo-Optic Resonance Tuning Sun, Jie; Kumar, Ranjeet; Sakib, Meer Journal of Lightwave Technology, Vol. 37, Issue 1 https://doi.org/10.1109/JLT.2018.2878327	journal	January 2019
PULSE: Optical Circuit Switched Data Center Architecture Operating at Nanosecond Timescales Benjamin, Joshua L.; Gerard, Thomas; Lavery, Domanic Journal of Lightwave Technology, Vol. 38, Issue 18 https://doi.org/10.1109/JLT.2020.2997664	journal	September 2020
Silicon Photonic Flex-LIONS for Reconfigurable Multi-GPU Systems Fariborz, Marjan; Xiao, Xian; Fotouhi, Pouya Journal of Lightwave Technology, Vol. 39, Issue 4 https://doi.org/10.1109/JLT.2021.3052713	journal	February 2021
X-NEST: A Scalable, Flexible, and High-Performance Network Architecture for Distributed Machine Learning Lu, Yunfeng; Gu, Huaxi; Yu, Xiaoshan Journal of Lightwave Technology, Vol. 39, Issue 13 https://doi.org/10.1109/JLT.2021.3073277	journal	July 2021
Peta-Scale Embedded Photonics Architecture for Distributed Deep Learning Applications Wu, Zhenguo; Dai, Liang Yuan; Novick, Asher Journal of Lightwave Technology, Vol. 41, Issue 12 https://doi.org/10.1109/JLT.2023.3276588	journal	June 2023
Scalable Microring-Based Silicon Clos Switch Fabric With Switch-and-Select Stages Cheng, Qixiang; Bahadori, Meisam; Hung, Yu-Han IEEE Journal of Selected Topics in Quantum Electronics, Vol. 25, Issue 5 https://doi.org/10.1109/JSTQE.2019.2911421	journal	September 2019
Petabit-Scale Silicon Photonic Interconnects With Integrated Kerr Frequency Combs Rizzo, Anthony; Daudlin, Stuart; Novick, Asher IEEE Journal of Selected Topics in Quantum Electronics, Vol. 29, Issue 1 https://doi.org/10.1109/JSTQE.2022.3197375	journal	January 2023
High-bypass Learning: Automated Detection of Tumor Cells That Significantly Impact Drug Response Wozniak, Justin M.; Yoo, Hyunseung; Mohd-Yusof, Jamaludin 2020 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) and Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S) https://doi.org/10.1109/MLHPCAI4S51975.2020.00012	conference	November 2020
Flexfly: Enabling a Reconfigurable Dragonfly through Silicon Photonics Wen, Ke; Samadi, Payman; Rumley, Sebastien SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.14	conference	November 2016
Architecture and Performance Studies of 3D-Hyper-FleX-LION for Reconfigurable All-to-All HPC Networks Liu, Gengchen; Proietti, Roberto; Fariborz, Marjan SC20: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC41405.2020.00030	conference	November 2020
Impact of Synchronization Topology on DML Performance: Both Logical Topology and Physical Topology Wang, Shuai; Geng, Jinkun; Li, Dan IEEE/ACM Transactions on Networking, Vol. 30, Issue 2 https://doi.org/10.1109/TNET.2021.3117042	journal	April 2022
Enabling Quasi-Static Reconfigurable Networks With Robust Topology Engineering Teh, Min Yee; Zhao, Shizhen; Cao, Peirui IEEE/ACM Transactions on Networking, Vol. 31, Issue 3 https://doi.org/10.1109/TNET.2022.3210534	journal	June 2023
Scalable architecture for sub-pJ/b multi-Tbps comb-driven DWDM silicon photonic transceiver Wang, Yuyang; Novick, Asher; Parsons, Robert Next-Generation Optical Communication: Components, Sub-Systems, and Systems XII https://doi.org/10.1117/12.2649506	conference	March 2023
BCube Guo, Chuanxiong; Lu, Guohan; Li, Dan Proceedings of the ACM SIGCOMM 2009 conference on Data communication https://doi.org/10.1145/1592568.1592577	conference	August 2009
Efficient large-scale language model training on GPU clusters using megatron-LM Narayanan, Deepak; Shoeybi, Mohammad; Casper, Jared Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3458817.3476209	conference	November 2021
Software-hardware co-design for fast and scalable training of deep learning recommendation models Mudigere, Dheevatsa; Hao, Yuchen; Huang, Jianyu Proceedings of the 49th Annual International Symposium on Computer Architecture https://doi.org/10.1145/3470496.3533727	conference	June 2022
Jupiter evolving Poutievski, Leon; Mashayekhi, Omid; Ong, Joon Proceedings of the ACM SIGCOMM 2022 Conference https://doi.org/10.1145/3544216.3544265	conference	August 2022
New methods to color the vertices of a graph Brélaz, Daniel Communications of the ACM, Vol. 22, Issue 4 https://doi.org/10.1145/359094.359101	journal	April 1979
Optimization of Collective Communication Operations in MPICH Thakur, Rajeev; Rabenseifner, Rolf; Gropp, William The International Journal of High Performance Computing Applications, Vol. 19, Issue 1 https://doi.org/10.1177/1094342005051521	journal	February 2005
Flexspander: augmenting expander networks in high-performance systems with optical bandwidth steering Teh, Min Yee; Wu, Zhenguo; Bergman, Keren Journal of Optical Communications and Networking, Vol. 12, Issue 4 https://doi.org/10.1364/JOCN.379487	journal	January 2020
RDON: a rack-scale disaggregated data center network based on a distributed fast optical switch Guo, Xiaotao; Yan, Fulong; Wang, Jingyan Journal of Optical Communications and Networking, Vol. 12, Issue 8 https://doi.org/10.1364/JOCN.394677	journal	July 2020
Performance trade-offs in reconfigurable networks for HPC Teh, Min Yee; Wu, Zhenguo; Glick, Madeleine Journal of Optical Communications and Networking, Vol. 14, Issue 6 https://doi.org/10.1364/JOCN.451760	journal	May 2022
Prospects and challenges of optical switching technologies for intra data center networks Sato, Ken-Ichi; Matsuura, Hiroyuki; Konoike, Ryotaro Journal of Optical Communications and Networking, Vol. 14, Issue 11 https://doi.org/10.1364/JOCN.467726	journal	October 2022
Very Deep Convolutional Networks for Large-Scale Image Recognition Simonyan, Karen; Zisserman, Andrew arXiv https://doi.org/10.48550/arXiv.1409.1556	preprint	January 2014
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes Jia, Xianyan; Song, Shutao; He, Wei arXiv https://doi.org/10.48550/arXiv.1807.11205	preprint	January 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton arXiv https://doi.org/10.48550/arXiv.1810.04805	preprint	January 2018
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Shoeybi, Mohammad; Patwary, Mostofa; Puri, Raul arXiv https://doi.org/10.48550/arXiv.1909.08053	preprint	January 2019
Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems Naumov, Maxim; Kim, John; Mudigere, Dheevatsa arXiv https://doi.org/10.48550/arXiv.2003.09518	preprint	January 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding Lepikhin, Dmitry; Lee, HyoukJoong; Xu, Yuanzhong arXiv https://doi.org/10.48550/arXiv.2006.16668	preprint	January 2020
Carbon Emissions and Large Neural Network Training Patterson, David; Gonzalez, Joseph; Le, Quoc arXiv https://doi.org/10.48550/arXiv.2104.10350	preprint	January 2021
Integrated Kerr frequency comb-driven silicon photonic transmitter Rizzo, Anthony; Novick, Asher; Gopal, Vignesh arXiv https://doi.org/10.48550/arXiv.2109.10297	preprint	January 2021
LaMDA: Language Models for Dialog Applications Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie arXiv https://doi.org/10.48550/arXiv.2201.08239	preprint	January 2022
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model Smith, Shaden; Patwary, Mostofa; Norick, Brandon arXiv https://doi.org/10.48550/arXiv.2201.11990	preprint	January 2022
PaLM: Scaling Language Modeling with Pathways Chowdhery, Aakanksha; Narang, Sharan; Devlin, Jacob arXiv https://doi.org/10.48550/arXiv.2204.02311	preprint	January 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Workshop, BigScience; Scao, Teven Le; Fan, Angela arXiv https://doi.org/10.48550/arXiv.2211.05100	preprint	January 2022
LLaMA: Open and Efficient Foundation Language Models Touvron, Hugo; Lavril, Thibaut; Izacard, Gautier arXiv https://doi.org/10.48550/arXiv.2302.13971	preprint	January 2023
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings Jouppi, Norman P.; Kurian, George; Li, Sheng arXiv https://doi.org/10.48550/arXiv.2304.01433	preprint	January 2023

Similar Records

Distributed deep learning training using silicon photonic switched architectures

Journal Article · Mon Feb 28 19:00:00 EST 2022 · APL Photonics · OSTI ID:1978979

LEED: A Lightwave Energy-Efficient Datacenter

Technical Report · Fri May 31 20:00:00 EDT 2024 · OSTI ID:2565965

New trends in photonic switching and optical networking architectures for data centers and computing systems [Invited]

Journal Article · Sat Dec 31 23:00:00 EST 2022 · Journal of Optical Communications and Networking · OSTI ID:2421350

Flexible silicon photonic architecture for accelerating distributed deep learning

Citation Formats

References (38)

Similar Records

Related Subjects