Use of level 3 BLAS in lu factorization in a multiprocessing environment on three vector multiprocessors; The Alliant FX/80, the CRAY-2, and the IBM 3090 VF
- CERFACS, Toulouse (FR)
The authors study various implementations of block Gaussian elimination of full matrices and examine their performance on three parallel computers, the Alliant FX/80, the CRAY-2, and the IBM 3090-400/VF. These implementations are expressed in terms of Level 3 BLAS matrix-matrix kernels. This paper considers the use of parallel Level 3 BLAS kernels and compare the parallelism obtained within the computational kernels with that obtained when parallelizing over the kernels. The authors show that the use of parallel Level 3 BLAS allows portability without sacrifice of efficiency, even in a parallel environment, and that high speeds can be obtained if tuned versions of the kernels are available.
- OSTI ID:
- 5545213
- Journal Information:
- International Journal of Supercomputer Applications; (United States), Journal Name: International Journal of Supercomputer Applications; (United States) Vol. 5:3; ISSN 0890-2720; ISSN IJSAE
- Country of Publication:
- United States
- Language:
- English
Similar Records
Level 3 blas in LU factorization on the CRAY-2, ETA-10P, and IBM 3090-200/VF
Vectorization of a multiprocessor multifrontal code