Scalable NIC-based reduction on large-scale clusters
- Adam
- Juan C.
- Fabrizio
Many parallel algorithms require effiaent support for reduction mllectives. Over the years, researchers have developed optimal reduction algonduns by taking inm account system size, dam size, and complexities of reduction operations. However, all of these algorithm have assumed the faa that the reduction precessing takes place on the host CPU. Modem Network Interface Cards (NICs) sport programmable processors with substantial memory and thus introduce a fresh variable into the equation This raises the following intersting challenge: Can we take advantage of modern NICs to implementJost redudion operations? In this paper, we take on this challenge in the context of large-scale clusters. Through experiments on the 960-node, 1920-processor or ASCI Linux Cluster (ALC) located at the Lawrence Livermore National Laboratory, we show that NIC-based reductions indeed perform with reduced latency and immed consistency over host-based aleorithms for the wmmon case and that these benefits scale as the system grows. In the largest configuration tested--1812 processors-- our NIC-based algorithm can sum a single element vector in 73 ps with 32-bi integers and in 118 with Mbit floating-point numnbers. These results represent an improvement, respeaively, of 121% and 39% with resvect w the {approx}roductionle vel MPI library
- Research Organization:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE
- OSTI ID:
- 976664
- Report Number(s):
- LA-UR-03-3208; TRN: US201017%%808
- Resource Relation:
- Conference: Submitted to: Supercomputing 2003, Phoenix, AZ, November 2003
- Country of Publication:
- United States
- Language:
- English
Similar Records
Bringing large-scale multiple genome analysis one step closer: ScalaBLAST and beyond
Software-Driven Network Architecture for Synchronous Data Acquisition