Abstract
APEnet+ is an INFN (Italian Institute for Nuclear Physics) project aiming to develop a custom 3-Dimensional torus interconnect network optimized for hybrid clusters CPU-GPU dedicated to High Performance scientific Computing. The APEnet+ interconnect fabric is built on a FPGA-based PCI-express board with 6 bi-directional off-board links showing 34 Gbps of raw bandwidth per direction, and leverages upon peer-to-peer capabilities of Fermi and Kepler-class NVIDIA GPUs to obtain real zero-copy, GPU-to-GPU low latency transfers. The minimization of APEnet+ transfer latency is achieved through the adoption of RDMA protocol implemented in FPGA with specialized hardware blocks tightly coupled with embedded microprocessor. This architecture provides a high performance low latency offload engine for both trasmit and receive side of data transactions: preliminary results are encouraging, showing 50% of bandwidth increase for large packet size transfers. In this paper we describe the APEnet+ architecture, detailing the hardware implementation and discuss the impact of such RDMA specialized hardware on host interface latency and bandwidth.
Ammendola A, R;
[1]
Biagioni, A;
Frezza, O;
Lo Cicero, F;
Lonardo, A;
Paolucci, P S;
Rossetti, D;
Simula, F;
Tosoratto, L;
Vicini, P
[2]
- INFN Roma II, Via della Ricerca Scientifica 1 – 00133 Roma (Italy)
- INFN Roma I, P.le Aldo Moro 2 – 00185 Roma (Italy)
Citation Formats
Ammendola A, R, Biagioni, A, Frezza, O, Lo Cicero, F, Lonardo, A, Paolucci, P S, Rossetti, D, Simula, F, Tosoratto, L, and Vicini, P.
Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network.
United Kingdom: N. p.,
2014.
Web.
doi:10.1088/1742-6596/523/1/012013.
Ammendola A, R, Biagioni, A, Frezza, O, Lo Cicero, F, Lonardo, A, Paolucci, P S, Rossetti, D, Simula, F, Tosoratto, L, & Vicini, P.
Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network.
United Kingdom.
https://doi.org/10.1088/1742-6596/523/1/012013
Ammendola A, R, Biagioni, A, Frezza, O, Lo Cicero, F, Lonardo, A, Paolucci, P S, Rossetti, D, Simula, F, Tosoratto, L, and Vicini, P.
2014.
"Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network."
United Kingdom.
https://doi.org/10.1088/1742-6596/523/1/012013.
@misc{etde_22377881,
title = {Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network}
author = {Ammendola A, R, Biagioni, A, Frezza, O, Lo Cicero, F, Lonardo, A, Paolucci, P S, Rossetti, D, Simula, F, Tosoratto, L, and Vicini, P}
abstractNote = {APEnet+ is an INFN (Italian Institute for Nuclear Physics) project aiming to develop a custom 3-Dimensional torus interconnect network optimized for hybrid clusters CPU-GPU dedicated to High Performance scientific Computing. The APEnet+ interconnect fabric is built on a FPGA-based PCI-express board with 6 bi-directional off-board links showing 34 Gbps of raw bandwidth per direction, and leverages upon peer-to-peer capabilities of Fermi and Kepler-class NVIDIA GPUs to obtain real zero-copy, GPU-to-GPU low latency transfers. The minimization of APEnet+ transfer latency is achieved through the adoption of RDMA protocol implemented in FPGA with specialized hardware blocks tightly coupled with embedded microprocessor. This architecture provides a high performance low latency offload engine for both trasmit and receive side of data transactions: preliminary results are encouraging, showing 50% of bandwidth increase for large packet size transfers. In this paper we describe the APEnet+ architecture, detailing the hardware implementation and discuss the impact of such RDMA specialized hardware on host interface latency and bandwidth.}
doi = {10.1088/1742-6596/523/1/012013}
journal = []
issue = {1}
volume = {523}
journal type = {AC}
place = {United Kingdom}
year = {2014}
month = {Jun}
}
title = {Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network}
author = {Ammendola A, R, Biagioni, A, Frezza, O, Lo Cicero, F, Lonardo, A, Paolucci, P S, Rossetti, D, Simula, F, Tosoratto, L, and Vicini, P}
abstractNote = {APEnet+ is an INFN (Italian Institute for Nuclear Physics) project aiming to develop a custom 3-Dimensional torus interconnect network optimized for hybrid clusters CPU-GPU dedicated to High Performance scientific Computing. The APEnet+ interconnect fabric is built on a FPGA-based PCI-express board with 6 bi-directional off-board links showing 34 Gbps of raw bandwidth per direction, and leverages upon peer-to-peer capabilities of Fermi and Kepler-class NVIDIA GPUs to obtain real zero-copy, GPU-to-GPU low latency transfers. The minimization of APEnet+ transfer latency is achieved through the adoption of RDMA protocol implemented in FPGA with specialized hardware blocks tightly coupled with embedded microprocessor. This architecture provides a high performance low latency offload engine for both trasmit and receive side of data transactions: preliminary results are encouraging, showing 50% of bandwidth increase for large packet size transfers. In this paper we describe the APEnet+ architecture, detailing the hardware implementation and discuss the impact of such RDMA specialized hardware on host interface latency and bandwidth.}
doi = {10.1088/1742-6596/523/1/012013}
journal = []
issue = {1}
volume = {523}
journal type = {AC}
place = {United Kingdom}
year = {2014}
month = {Jun}
}