Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Using Multirail Networks in High Performance Clusters Salvador Coll, Eitan Frachtenberg, Fabrizio Petrini,
 

Summary: Using Multirail Networks in High Performance Clusters
¢¡
Salvador Coll, Eitan Frachtenberg, Fabrizio Petrini,
Adolfy Hoisie and Leonid Gurvits
CCS-3 Modeling, Algorithms, & Informatics Group
Computer & Computational Sciences Division
Los Alamos National Laboratory
{scoll,eitanf,fabrizio,hoisie,gurvits}@lanl.gov
Abstract
Using multiple independent networks (also known as rails) is an emerging technique to overcome bandwidth limitations
and enhance fault tolerance of current high-performance parallel computers. In this paper we present and analyze various
algorithms to allocate multiple communication rails, including static and dynamic allocation schemes. An analytical lower
bound on the number of rails required for static rail allocation is shown. We also present an extensive experimental compar-
ison of the behavior of various algorithms in terms of bandwidth and latency. We show that striping messages over multiple
rails can substantially reduce network latency, depending on average message size, network load, and allocation scheme. The
compared methods include a static rail allocation, a basic round-robin rail allocation, a local-dynamic allocation based on
local knowledge, and a dynamic rail allocation that reserves both communication endpoints of a message before sending it.
The last method is shown to perform better than the others at higher loads: up to £¥¤¥¦ better than local-knowledge allocation
and §©¨¦ better than the round-robin allocation. This allocation scheme also shows lower latency and it saturates at higher
loads (for messages long enough). Most importantly, this proposed allocation scheme scales well with the number of rails and

  

Source: Arnau, Salvador Coll - Departamento de Ingeniería Electrónica, Universitat Politècnica de València
Petrini, Fabrizio - Computational Sciences and Mathematics Division, Pacific Northwest National Laboratory

 

Collections: Computer Technologies and Information Sciences; Engineering