The design and implementation of the parallel out-of-core ScaLAPACK LU, QR and Cholesky factorization routines
This paper describes the design and implementation of three core factorization routines--LU, QR and Cholesky--included in the out-of-core extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to fit entirely in physical memory. An image of the full matrix is maintained on disk and the factorization routines transfer sub-matrices into memory. The left-looking column-oriented variant of the factorization algorithm is implemented to reduce the disk I/O traffic. The routines are implemented using a portable I/O interface and utilize high performance ScaLAPACK factorization routines as in-core computational kernels. The authors present the details of the implementation for the out-of-core ScaLAPACK factorization routines, as well as performance and scalability results on the Intel Paragon.
- Research Organization:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE, Washington, DC (United States); USDOE Office of Energy Research, Washington, DC (United States); National Science Foundation, Washington, DC (United States); Defense Advanced Research Projects Agency, Arlington, VA (United States)
- DOE Contract Number:
- AC05-96OR22464
- OSTI ID:
- 296722
- Report Number(s):
- ORNL/TM-13372; R&D Project: 4AC; ON: DE98054626; BR: 11A400301; CNN: Grant ASC-9005933; Contract DAAL03-91-C-0047; Agreement CCR-8809615; TRN: AHC29903%%120
- Resource Relation:
- Other Information: PBD: Apr 1997
- Country of Publication:
- United States
- Language:
- English
Similar Records
Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs
Scalability issues affecting the design of a dense linear algebra library