Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Evaluating On-Node GPU Interconnects for Deep Learning Workloads

Conference ·

Scaling deep learning workloads across multiple GPUs on a single node has become increasingly important in data analytics. A key question is how well a PCIe-based GPU interconnect can perform relative to a custom high-performance interconnect such as NVIDIA's NVLink. This paper evaluates two such on-node interconnects for eight NVIDIA Pascal P100 GPUs: (a) the NVIDIA DGX-1's NVLink 1.0 `hybrid cube mesh'; and (b) the Cirrascale GX8's two-level PCIe tree using dual SR3615 switch risers. To show the effects of a range of neural network workloads, we define a parameterized version of the popular ResNet. We define a workload intensity metric that characterizes the expected computation/communication ratio; we also locate AlexNet and GoogLeNet within that space. As expected, the DGX-1 typically has superior performance. However, when equalizing GPU SM frequencies, the GX8 is very competitive on all ResNet workloads. With 8 GPUs, the GX8 can outperform the DGX-1 on all-to-all reductions by 10% for medium-sized payloads; and in rare cases, the GX8 slightly outperforms on ResNet.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1525777
Report Number(s):
PNNL-SA-129849
Country of Publication:
Switzerland
Language:
English

Similar Records

Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
Conference · Mon Jul 03 00:00:00 EDT 2017 · OSTI ID:1373860

Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
Conference · Thu Aug 24 00:00:00 EDT 2017 · OSTI ID:1411927

Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
Journal Article · Sat May 05 00:00:00 EDT 2018 · Future Generations Computer Systems · OSTI ID:1617450

Related Subjects