Efficient Shared Memory and RDMA based Collectives on Multi-rail QsNetII
Ying Qian Ahmad Afsahi1
Department of Electrical and Computer Engineering
Queen's University, Kingston, ON, Canada K7L 3N6
Clusters of Symmetric Multiprocessors (SMP) are more commonplace than ever in achieving high-
performance. Scientific applications running on clusters employ collective communications extensively.
Shared memory communication and Remote Direct Memory Access (RDMA) over multi-rail networks are
promising approaches in addressing the increasing demand on intra-node and inter-node
communications, and thereby in boosting the performance of collectives in emerging multi-core SMP
clusters. In this regard, this paper designs and evaluates two classes of collective communication
algorithms directly at the Elan user-level over multi-rail Quadrics QsNetII
with message striping: 1)
RDMA-based traditional multi-port algorithms for gather, all-gather, and all-to-all collectives for
medium to large messages, and 2) RDMA-based and SMP-aware multi-port all-gather algorithms for
small to medium size messages.
The multi-port RDMA-based Direct algorithm for gather and all-to-all collectives gain an