Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

Ammendola, Roberto; Biagioni, Andrea; Frezza, Ottorino; Cicero, Francesca Lo; Paolucci, Pier Stanislao; Lonardo, Alessandro; Rossetti, Davide; Simula, Francesco; Tosoratto, Laura; Vicini, Piero

doi:10.1088/1742-6596/513/5/052002

Made available by
U.S. Department of Energy
Office of Scientific and Technical Information

ETDEWeb

ETDEWEB / / Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

Full Record

Abstract

Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpose computation. A tight interaction between the GPU and the interconnection network is the strategy to express the full potential on capability computing of a multi-GPU system on large HPC clusters; that is the reason why an efficient and scalable interconnect is a key technology to finally deliver GPUs for scientific HPC. In this paper we show the latest architectural and performance improvement of the APEnet+ network fabric, a FPGA-based PCIe board with 6 fully bidirectional off-board links with 34 Gbps of raw bandwidth per direction, and X8 Gen2 bandwidth towards the host PC. The board implements a Remote Direct Memory Access (RDMA) protocol that leverages upon peer-to-peer (P2P) capabilities of Fermi- and Kepler-class NVIDIA GPUs to obtain real zero-copy, low-latency GPU-to-GPU transfers. Finally, we report on the development activities for 2013 focusing on the adoption of the latest generation 28 nm FPGAs and the preliminary tests performed on this new platform.

Authors:

Ammendola, Roberto; ^[1] Biagioni, Andrea; Frezza, Ottorino; Cicero, Francesca Lo; Paolucci, Pier Stanislao; Lonardo, Alessandro; Rossetti, Davide; Simula, Francesco; Tosoratto, Laura; Vicini, Piero ^[2]

INFN Sezione Roma Tor Vergata (Italy)
INFN Sezione Roma (Italy)

Publication Date:

Jun 11, 2014

DOI:

https://doi.org/10.1088/1742-6596/513/5/052002

Product Type:

Journal Article

Resource Relation:

Journal Name: Journal of Physics. Conference Series (Online); Journal Volume: 513; Journal Issue: 5; Conference: CHEP2013: 20. international conference on computing in high energy and nuclear physics, Amsterdam (Netherlands), 14-18 Oct 2013; Other Information: Country of input: International Atomic Energy Agency (IAEA)

Subject:

97 MATHEMATICAL METHODS AND COMPUTING; ACCELERATORS; CALCULATION METHODS; COMPUTER ARCHITECTURE; COMPUTER NETWORKS; COMPUTER-GRAPHICS DEVICES; INTERACTIONS; MEMORY MANAGEMENT; PERFORMANCE

OSTI ID:

22381205

Country of Origin:

United Kingdom

Language:

English

Other Identifying Numbers:

Journal ID: ISSN 1742-6596; TRN: GB15P9445086719

Availability:

Available from http://dx.doi.org/10.1088/1742-6596/513/5/052002

Submitting Site:

INIS

Size:

[5 page(s)]

Announcement Date:

Aug 21, 2015

Citation Formats

Ammendola, Roberto, Biagioni, Andrea, Frezza, Ottorino, Cicero, Francesca Lo, Paolucci, Pier Stanislao, Lonardo, Alessandro, Rossetti, Davide, Simula, Francesco, Tosoratto, Laura, and Vicini, Piero. Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems. United Kingdom: N. p., 2014. Web. doi:10.1088/1742-6596/513/5/052002.

Ammendola, Roberto, Biagioni, Andrea, Frezza, Ottorino, Cicero, Francesca Lo, Paolucci, Pier Stanislao, Lonardo, Alessandro, Rossetti, Davide, Simula, Francesco, Tosoratto, Laura, & Vicini, Piero. Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems. United Kingdom. https://doi.org/10.1088/1742-6596/513/5/052002

Ammendola, Roberto, Biagioni, Andrea, Frezza, Ottorino, Cicero, Francesca Lo, Paolucci, Pier Stanislao, Lonardo, Alessandro, Rossetti, Davide, Simula, Francesco, Tosoratto, Laura, and Vicini, Piero. 2014. "Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems." United Kingdom. https://doi.org/10.1088/1742-6596/513/5/052002.

@misc{etde_22381205,
title = {Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems}
author = {Ammendola, Roberto, Biagioni, Andrea, Frezza, Ottorino, Cicero, Francesca Lo, Paolucci, Pier Stanislao, Lonardo, Alessandro, Rossetti, Davide, Simula, Francesco, Tosoratto, Laura, and Vicini, Piero}
abstractNote = {Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpose computation. A tight interaction between the GPU and the interconnection network is the strategy to express the full potential on capability computing of a multi-GPU system on large HPC clusters; that is the reason why an efficient and scalable interconnect is a key technology to finally deliver GPUs for scientific HPC. In this paper we show the latest architectural and performance improvement of the APEnet+ network fabric, a FPGA-based PCIe board with 6 fully bidirectional off-board links with 34 Gbps of raw bandwidth per direction, and X8 Gen2 bandwidth towards the host PC. The board implements a Remote Direct Memory Access (RDMA) protocol that leverages upon peer-to-peer (P2P) capabilities of Fermi- and Kepler-class NVIDIA GPUs to obtain real zero-copy, low-latency GPU-to-GPU transfers. Finally, we report on the development activities for 2013 focusing on the adoption of the latest generation 28 nm FPGAs and the preliminary tests performed on this new platform.}
doi = {10.1088/1742-6596/513/5/052002}
journal = []
issue = {5}
volume = {513}
journal type = {AC}
place = {United Kingdom}
year = {2014}
month = {Jun}
}

ETDEWeb

Journal Article:

SAVE / SHARE

Abstract

Journal Article:

SAVE / SHARE

Citation Formats