skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Hardware-and-software-based collective communication on the Quadrics network.

Conference ·

The efficient implementation of collective communication patterns in a parallel machine is a challenging design effort, that requires the solution of many problems. In this paper we present an in-depth description of how the Quadrics network supports both hardware- and software-based collectives. We describe the main features of the two building blocks of this network, a network interface that can perform zero-copy user-level communication and a wormhole switch. We also focus our attention on the routing and $ow control algorithms, deadlock avoidance and on how the processing nodes are integrated in a global, virtual shared memory. Experimental results conducted on 64-node AlphaServer cluster indicate that the time to complete the hardware-based barrier synchronization on the whole network is as low as 6 ps, with veiy good scalability. Good latency and scalability are also achieved with the software-based synchronization, which takes about 15 ps. With the broadcast, similar performance is achieved by the hardware- and software-based implementations, which can deliver messages of up to 256 b,ytes in 13 ps and can get a sustained bandwidth of 288 Mbyteshec on all the nodes, with wressages larger than 64KB. The hardware-based barrier is almost insensitive to the network congestion, with 93% of the synchronizations taking less than 20 ps. On the other hand, the software based implementation suflers from a signif cant performance degradation. In high load environments the hardware broadcast maintains a reasonably good performance, delivering messages up to 2KB in 200 ps, while the software broadcast suffers from slightly higher latencies inherited by the synchronization mechanism.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE
OSTI ID:
975699
Report Number(s):
LA-UR-01-4692; TRN: US201018%%787
Resource Relation:
Conference: Submitted to: NCA 2001, [IEEE International Symposium on Network Computing and Applications, October 2001, Boston].
Country of Publication:
United States
Language:
English