QR factorization of a dense matrix on a shared-memory multiprocessor
A new algorithm for computing an orthogonal decomposition of a rectangular m x n matrix A on a shared-memory parallel computer is described. The algorithm uses Givens rotations, and has the feature that its synchronization cost is low. In particular, for a multiprocessor having p processors, an analysis of the algorithm shows that this cost is O (n/sup 2//p) if m/p greater than or equal to n, and O (mn/p/sup 2/) if m/p < n. Note that in the latter case, the synchronization cost is smaller than O (n/sup 2//p). Therefore, the synchronization cost of the algorithm proposed in this article is bounded by O (n/sup 2//p) when m greater than or equal to n. This is important for machines where synchronization cost is high, and when m >> n. Analysis and experiments show that the algorithm is effective in balancing the load and producing high efficiency (speed-up). 13 refs.
- Research Organization:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- DOE Contract Number:
- AC05-84OR21400
- OSTI ID:
- 5928811
- Report Number(s):
- ORNL/TM-10581; ON: DE88001506
- Resource Relation:
- Other Information: Portions of this document are illegible in microfiche products. Original copy available until stock is exhausted
- Country of Publication:
- United States
- Language:
- English
Similar Records
QR factorization of a dense matrix on a hypercube multiprocessor
Gaussian techniques on shared memory multiprocessor computers