Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
- BATTELLE (PACIFIC NW LAB)
- College of William and Mary
In this paper, we fill the gap by proposing a multi-GPU benchmark suite named Tartan, which contains microbenchmarks, scale-up and scale-out applications. We then apply Tartan to evaluate the four latest types of modern GPU interconnects, i.e., PCI- e, NVLink-V1, NVLink-V2 and InfiniBand with GPUDirect- RDMA from two recently released NVIDIA super AI platforms as well as ORNL’s exascale prototype system. Based on empirical evaluation, we observe four new types of NUMA effects: three types are triggered by NVLink’s topology, connectivity and routing, while one type is caused by PCI-e (i.e., anti-locality). They are very important for performance tuning in multi-GPU environment. Our evaluation results show that, unless the current CPU-GPU master-slave programming model can be replaced, it is difficult for scale-up multi-GPU applications to really benefit from faster intra-node interconnects such as NVLinks; while for inter-node scale-out applications, although interconnect is more crucial to the overall performance, GPUDirect-RDMA appears to be not always the optimal choice.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1511696
- Report Number(s):
- PNNL-SA-137642
- Resource Relation:
- Conference: IEEE International Symposium on Workload Characterization (IISWC 2018), September 30-October 2, 2018
- Country of Publication:
- United States
- Language:
- English
Similar Records
GPU-Centric Communication on NVIDIA GPU Clusters with InfiniBand: A Case Study with OpenSHMEM
Evaluating On-Node GPU Interconnects for Deep Learning Workloads