Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

ConnectX-2 CORE-Direct Enabled Asynchronous Broadcast Collective Communications

Conference ·
OSTI ID:1014252
This paper describes the design and implementation of InfiniBand (IB) CORE-Direct based blocking and nonblocking broadcast operations within the Cheetah collective operation framework. It describes a novel approach that fully ofFLoads collective operations and employs only user-supplied buffers. For a 64 rank communicator, the latency of CORE-Direct based hierarchical algorithm is better than production-grade Message Passing Interface (MPI) implementations, 150% better than the default Open MPI algorithm and 115% better than the shared memory optimized MVAPICH implementation for a one kilobyte (KB) message, and for eight mega-bytes (MB) it is 48% and 64% better, respectively. Flat-topology broadcast achieves 99.9% overlap in a polling based communication-computation test, and 95.1% overlap for a wait based test, compared with 92.4% and 17.0%, respectively, for a similar Central Processing Unit (CPU) based implementation.
Research Organization:
Oak Ridge National Laboratory (ORNL)
Sponsoring Organization:
SC USDOE - Office of Science (SC)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1014252
Country of Publication:
United States
Language:
English

Similar Records

Exploring the All-to-All Collective Optimization Space with ConnectX CORE-Direct
Conference · Sat Sep 01 00:00:00 EDT 2012 · 2012 41st International Conference on Parallel Processing; 10-13 Sept. 2012; Pittsburgh, PA, USA · OSTI ID:1567578

Overlapping Computation and Communication: Barrier Algorithms and ConnectX-2 CORE-Direct Capabilities
Conference · Thu Dec 31 23:00:00 EST 2009 · OSTI ID:982147

Overlapping Computation and Communication: Barrier Algorithms and ConnectX-2 CORE-Direct Capabilities
Conference · Thu Dec 31 23:00:00 EST 2009 · OSTI ID:1003760