skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Exploring the benefits of using co-packaged optics in data center and AI supercomputer networks: a simulation-based analysis [Invited]

Journal Article · · Journal of Optical Communications and Networking
DOI:https://doi.org/10.1364/JOCN.501427· OSTI ID:2279094

We investigate the advantages of using co-packaged optics in next-generation data center and AI supercomputer networks. The increased escape bandwidth offered by co-packaged optics provides multiple possibilities for building 50T switches and beyond, expanding the opportunities in both the data center and supercomputing domains. This provides network architects with the opportunity to expand their design space and develop simplified networks with enhanced network locality properties. Co-packaging at the switch and server points enables networks with double capacity while reducing the switch count by 64% compared to state-of-the-art systems. We evaluate these concepts through discrete-event simulations using all-to-all and all-reduce traffic patterns that simulate collective communications commonly found in network-bound applications. Initially, we investigate the all-to-all overhead involved in distributing the virtual machines of the applications across multiple leaf switches and compare it to the scenario in which all VMs are placed under a single switch. Subsequently, we evaluate the performance of an AI supercomputing cluster by simulating both patterns for different message sizes, while also varying the number of participating nodes. The results suggest that networks with improved locality properties become increasingly important as the network stack operates at higher speeds; for a stack latency of 1.25 µs, placing the applications under multiple switches can result in up to 68% higher completion times than placing them under a single switch. For AI supercomputers, significant improvements are observed in the mean server throughput, reaching more than 90% for configurations involving 256 nodes and message sizes of at least 128 KiB.

Sponsoring Organization:
USDOE Advanced Research Projects Agency - Energy (ARPA-E)
Grant/Contract Number:
AR0000846
OSTI ID:
2279094
Journal Information:
Journal of Optical Communications and Networking, Journal Name: Journal of Optical Communications and Networking Vol. 16 Journal Issue: 2; ISSN 1943-0620
Publisher:
Optical Society of AmericaCopyright Statement
Country of Publication:
United States
Language:
English

References (27)

Input Versus Output Queueing on a Space-Division Packet Switch journal December 1987
Solving the corner-turning problem for large interferometers journal October 2010
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding preprint January 2018
Silicon Photonics for Neuromorphic Computing and Artificial Intelligence: Applications and Roadmap conference April 2022
TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Package Optical I/O journal March 2020
Network traffic characteristics of hyperscale data centers in the era of cloud applications journal September 2023
Optics enabled networks and architectures for data center cost and power efficiency [Invited] journal November 2021
Toward lower-diameter large-scale HPC and data center networks with co-packaged optics journal November 2020
On the Opportunities and Risks of Foundation Models preprint January 2021
Traffic generation for benchmarking data centre networks journal November 2022
2.5D Heterogeneous Integration for Silicon Photonics Engines in Optical Transceivers journal January 2022
Ten Lessons From Three Generations Shaped Google’s TPUv4i : Industrial Product conference June 2021
Bandwidth-optimal all-to-all exchanges in fat tree networks
  • Prisacari, Bogdan; Rodriguez, German; Minkenberg, Cyriel
  • Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13 https://doi.org/10.1145/2464996.2465434
conference January 2013
Is Network the Bottleneck of Distributed Training? conference August 2020
Toward higher-radix switches with co-packaged optics for improved network locality in data center and HPC networks [Invited] journal March 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism preprint January 2019
High Speed VCSELs and Co-Packaging for Short Reach Communication within Cloud and High Performance Computing conference November 2019
Evolutionary-scale prediction of atomic-level protein structure with a language model journal March 2023
CloudSim Plus: A cloud computing simulation framework pursuing software engineering principles for improved modularity, extensibility and correctness conference May 2017
1.6 Tbps Silicon Photonics Integrated Circuit and 800 Gbps Photonic Engine for Switch Co-Packaging Demonstration journal February 2021
Co‐packaged datacenter optics: Opportunities and challenges journal March 2021
A Case For Intra-rack Resource Disaggregation in HPC
  • Michelogiannakis, George; Klenk, Benjamin; Cook, Brandon
  • ACM Transactions on Architecture and Code Optimization, Vol. 19, Issue 2 https://doi.org/10.1145/3514245
journal June 2022
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings conference June 2023
Feasibility Demonstration of Server Chip Package With Direct-to-Chip Optical Transceivers
  • Li, Shidong; Parikh, Bakul; Savoy, Chelsea
  • ASME 2022 International Technical Conference and Exhibition on Packaging and Integration of Electronic and Photonic Microsystems https://doi.org/10.1115/IPACK2022-97455
conference October 2022
Hoard: A Distributed Data Caching System to Accelerate Deep Learning Training on the Cloud preprint January 2018
Co-Package Technology Platform for Low-Power and Low-Cost Data Centers journal June 2021
Co-packaged optics for HPC and data center networks conference March 2021

Similar Records

Toward higher-radix switches with co-packaged optics for improved network locality in data center and HPC networks [Invited]
Journal Article · Fri Mar 04 00:00:00 EST 2022 · Journal of Optical Communications and Networking · OSTI ID:2279094

Final Report for Project DE-FC02-06ER25755 [Pmodels2]
Technical Report · Wed Mar 12 00:00:00 EDT 2014 · OSTI ID:2279094

Multiple crossbar network: A switched high-speed local network
Conference · Sun Jan 01 00:00:00 EST 1989 · OSTI ID:2279094

Related Subjects