ATCOM: Automatically Tuned Collective Communication System for SMP Clusters
- Iowa State Univ., Ames, IA (United States)
Conventional implementations of collective communications are based on point-to-point communications, and their optimizations have been focused on efficiency of those communication algorithms. However, point-to-point communications are not the optimal choice for modern computing clusters of SMPs due to their two-level communication structure. In recent years, a few research efforts have investigated efficient collective communications for SMP clusters. This dissertation is focused on platform-independent algorithms and implementations in this area. There are two main approaches to implementing efficient collective communications for clusters of SMPs: using shared memory operations for intra-node communications, and overlapping inter-node/intra-node communications. The former fully utilizes the hardware based shared memory of an SMP, and the latter takes advantage of the inherent hierarchy of the communications within a cluster of SMPs. Previous studies focused on clusters of SMP from certain vendors. However, the previously proposed methods are not portable to other systems. Because the performance optimization issue is very complicated and the developing process is very time consuming, it is highly desired to have self-tuning, platform-independent implementations. As proven in this dissertation, such an implementation can significantly out-perform the other point-to-point based portable implementations and some platform-specific implementations. The dissertation describes in detail the architecture of the platform-independent implementation. There are four system components: shared memory-based collective communications, overlapping mechanisms for inter-node and intra-node communications, a prediction-based tuning module and a micro-benchmark based tuning module. Each component is carefully designed with the goal of automatic tuning in mind.
- Research Organization:
- Ames Lab., Ames, IA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- W-7405-ENG-82
- OSTI ID:
- 861637
- Report Number(s):
- IS--T 1987
- Country of Publication:
- United States
- Language:
- English
Similar Records
Exploration of SMP-Aware DAO Memory Performance Issues-Final Report 2002
Distributed out-of-memory NMF on CPU/GPU architectures