Parallelization of the NAS Conjugate Gradient Benchmark Using the Global Arrays Shared Memory Programming Model
The NAS Conjugate Gradient (CG) benchmark is an important scientific kernel used to evaluate machine performance and compare characteristics of different programming models. Global Arrays (GA) toolkit supports a shared memory programming paradigm— even on distributed memory systems— and offers the programmer control over the distribution and locality that are important for optimizing performance on scalable architectures. In this paper, we describe and compare two different parallelization strategies of the CG benchmark using GA and report performance results on a shared-memory system as well as on a cluster. Performance benefits of using shared memory for irregular/sparse computations have been demonstrated before in context of the CG benchmark using OpenMP. Similarly, the GA implementation outperforms the standard MPI implementation on shared memory system, in our case the SGI Altix. However, with GA these benefits are extended to distributed memory systems and demonstrated on a Linux cluster with Myrinet.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 914702
- Report Number(s):
- PNNL-SA-43830; KJ0101030; TRN: US200812%%29
- Resource Relation:
- Conference: 19th IEEE International Parallel & Distributed Processing Symposium
- Country of Publication:
- United States
- Language:
- English
Similar Records
SRUMMA: A Matrix Multiplication Algorithm Suitable for Clusters and Scalable Shared Memory Systems
Comparative Study of Message Passing and Shared Memory Parallel Programming Models in Neural Network Training