Gaussian techniques on shared-memory multiprocessors
The performance characteristics of numerical algorithms running on single processor computers are well understood in terms of operation count and vectorization. When examining algorithm performance on a shared-memory multiprocessor one must consider, in addition to operation count and vectorization, the effects of processor synchronization, serial sections, memory access conflicts, and load imbalances. In this thesis the performance of Gauss and Gauss-Jordan elimination on a shared-memory multiprocessor is considered. Because real multiprocessors with appropriately pipelined functional units and suitably large numbers of processors are not yet available, the Cerberus multiprocessor simulator is used to evaluate algorithm performance. A general purpose synchronization strategy using barriers to satisfy data dependencies is commonly used in parallel algorithms. The barrier requires that processors wait for all other processors to arrive before execution is continued. Barrier synchronization can be used to satisfy most data dependencies, but in many cases is more than is needed. A key result of this work is that a custom synchronization strategy which explicitly exploits data dependencies of the Gauss elimination algorithm can outperform the generic barrier synchronization strategy without special hardware support for synchronization operations. When one is studying algorithms for multiprocessors, the traditional operation count analysis can be a poor predictor of performance. An algorithm which might be a poor performance choice on a single-processor vector architecture might become the star performer on a multiprocessor. Another result of this work is that Gauss-Jordan elimination, which has an operation count 50% greater than Gauss elimination, can perform better than the latter algorithm as the number of processors is increased for a fixed problem size.
- Research Organization:
- Lawrence Livermore National Lab., CA (USA)
- DOE Contract Number:
- W-7405-ENG-48
- OSTI ID:
- 5113829
- Report Number(s):
- UCRL-53863; ON: DE88010594
- Resource Relation:
- Other Information: THESIS (M.S.). SUBMITTED TO UNIV. OF CALIFORNIA, DAVIS
- Country of Publication:
- United States
- Language:
- English
Similar Records
Fast, contention-free combining tree barriers for shared-memory multiprocessors
The performance implications of thread management alternatives for shared-memory multiprocessors