You need JavaScript to view this

Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network

Abstract

APEnet+ is an INFN (Italian Institute for Nuclear Physics) project aiming to develop a custom 3-Dimensional torus interconnect network optimized for hybrid clusters CPU-GPU dedicated to High Performance scientific Computing. The APEnet+ interconnect fabric is built on a FPGA-based PCI-express board with 6 bi-directional off-board links showing 34 Gbps of raw bandwidth per direction, and leverages upon peer-to-peer capabilities of Fermi and Kepler-class NVIDIA GPUs to obtain real zero-copy, GPU-to-GPU low latency transfers. The minimization of APEnet+ transfer latency is achieved through the adoption of RDMA protocol implemented in FPGA with specialized hardware blocks tightly coupled with embedded microprocessor. This architecture provides a high performance low latency offload engine for both trasmit and receive side of data transactions: preliminary results are encouraging, showing 50% of bandwidth increase for large packet size transfers. In this paper we describe the APEnet+ architecture, detailing the hardware implementation and discuss the impact of such RDMA specialized hardware on host interface latency and bandwidth.
Authors:
Ammendola A, R; [1]  Biagioni, A; Frezza, O; Lo Cicero, F; Lonardo, A; Paolucci, P S; Rossetti, D; Simula, F; Tosoratto, L; Vicini, P [2] 
  1. INFN Roma II, Via della Ricerca Scientifica 1 – 00133 Roma (Italy)
  2. INFN Roma I, P.le Aldo Moro 2 – 00185 Roma (Italy)
Publication Date:
Jun 06, 2014
Product Type:
Journal Article
Resource Relation:
Journal Name: Journal of Physics. Conference Series (Online); Journal Volume: 523; Journal Issue: 1; Conference: ACAT2013: 15. international workshop on advanced computing and analysis techniques in physics research, Beijing (China), 16-21 May 2013; Other Information: Country of input: International Atomic Energy Agency (IAEA)
Subject:
97 MATHEMATICAL METHODS AND COMPUTING; COMPUTER ARCHITECTURE; COMPUTER NETWORKS; DISTRIBUTED DATA PROCESSING; HOST; IMPLEMENTATION; INTERFACES; MICROPROCESSORS; PERFORMANCE; THREE-DIMENSIONAL CALCULATIONS
OSTI ID:
22377881
Country of Origin:
United Kingdom
Language:
English
Other Identifying Numbers:
Journal ID: ISSN 1742-6596; TRN: GB15P9858083343
Availability:
Available from http://dx.doi.org/10.1088/1742-6596/523/1/012013
Submitting Site:
INIS
Size:
[8 page(s)]
Announcement Date:
Aug 13, 2015

Citation Formats

Ammendola A, R, Biagioni, A, Frezza, O, Lo Cicero, F, Lonardo, A, Paolucci, P S, Rossetti, D, Simula, F, Tosoratto, L, and Vicini, P. Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network. United Kingdom: N. p., 2014. Web. doi:10.1088/1742-6596/523/1/012013.
Ammendola A, R, Biagioni, A, Frezza, O, Lo Cicero, F, Lonardo, A, Paolucci, P S, Rossetti, D, Simula, F, Tosoratto, L, & Vicini, P. Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network. United Kingdom. https://doi.org/10.1088/1742-6596/523/1/012013
Ammendola A, R, Biagioni, A, Frezza, O, Lo Cicero, F, Lonardo, A, Paolucci, P S, Rossetti, D, Simula, F, Tosoratto, L, and Vicini, P. 2014. "Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network." United Kingdom. https://doi.org/10.1088/1742-6596/523/1/012013.
@misc{etde_22377881,
title = {Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network}
author = {Ammendola A, R, Biagioni, A, Frezza, O, Lo Cicero, F, Lonardo, A, Paolucci, P S, Rossetti, D, Simula, F, Tosoratto, L, and Vicini, P}
abstractNote = {APEnet+ is an INFN (Italian Institute for Nuclear Physics) project aiming to develop a custom 3-Dimensional torus interconnect network optimized for hybrid clusters CPU-GPU dedicated to High Performance scientific Computing. The APEnet+ interconnect fabric is built on a FPGA-based PCI-express board with 6 bi-directional off-board links showing 34 Gbps of raw bandwidth per direction, and leverages upon peer-to-peer capabilities of Fermi and Kepler-class NVIDIA GPUs to obtain real zero-copy, GPU-to-GPU low latency transfers. The minimization of APEnet+ transfer latency is achieved through the adoption of RDMA protocol implemented in FPGA with specialized hardware blocks tightly coupled with embedded microprocessor. This architecture provides a high performance low latency offload engine for both trasmit and receive side of data transactions: preliminary results are encouraging, showing 50% of bandwidth increase for large packet size transfers. In this paper we describe the APEnet+ architecture, detailing the hardware implementation and discuss the impact of such RDMA specialized hardware on host interface latency and bandwidth.}
doi = {10.1088/1742-6596/523/1/012013}
journal = []
issue = {1}
volume = {523}
journal type = {AC}
place = {United Kingdom}
year = {2014}
month = {Jun}
}