Reconstructing householder vectors from Tall-Skinny QR
Abstract
The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. Furthermore, we also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees onmore »
- Authors:
-
- Sandia National Lab. (SNL-CA), Livermore, CA (United States)
- Univ. of California, Berkeley, CA (United States)
- INRIA Paris, Rocquencourt (France)
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Publication Date:
- Research Org.:
- Sandia National Lab. (SNL-CA), Livermore, CA (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA)
- OSTI Identifier:
- 1236219
- Alternate Identifier(s):
- OSTI ID: 1250173
- Report Number(s):
- SAND-2015-1977J
Journal ID: ISSN 0743-7315; 579371
- Grant/Contract Number:
- AC04-94AL85000; AC02-05CH11231; SC0008700; SC0010200
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Journal of Parallel and Distributed Computing
- Additional Journal Information:
- Journal Volume: 85; Journal Issue: C; Journal ID: ISSN 0743-7315
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; QR decomposition; dense linear algebra; communication-avoiding algorithms
Citation Formats
Ballard, Grey Malone, Demmel, James, Grigori, Laura, Jacquelin, Mathias, Knight, Nicholas, and Nguyen, Hong Diep. Reconstructing householder vectors from Tall-Skinny QR. United States: N. p., 2015.
Web. doi:10.1016/j.jpdc.2015.06.003.
Ballard, Grey Malone, Demmel, James, Grigori, Laura, Jacquelin, Mathias, Knight, Nicholas, & Nguyen, Hong Diep. Reconstructing householder vectors from Tall-Skinny QR. United States. https://doi.org/10.1016/j.jpdc.2015.06.003
Ballard, Grey Malone, Demmel, James, Grigori, Laura, Jacquelin, Mathias, Knight, Nicholas, and Nguyen, Hong Diep. Wed .
"Reconstructing householder vectors from Tall-Skinny QR". United States. https://doi.org/10.1016/j.jpdc.2015.06.003. https://www.osti.gov/servlets/purl/1236219.
@article{osti_1236219,
title = {Reconstructing householder vectors from Tall-Skinny QR},
author = {Ballard, Grey Malone and Demmel, James and Grigori, Laura and Jacquelin, Mathias and Knight, Nicholas and Nguyen, Hong Diep},
abstractNote = {The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. Furthermore, we also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance.},
doi = {10.1016/j.jpdc.2015.06.003},
journal = {Journal of Parallel and Distributed Computing},
number = C,
volume = 85,
place = {United States},
year = {Wed Aug 05 00:00:00 EDT 2015},
month = {Wed Aug 05 00:00:00 EDT 2015}
}
Web of Science
Works referenced in this record:
A blocked QR-decomposition for the parallel symmetric eigenvalue problem
journal, July 2014
- Auckenthaler, T.; Huckle, T.; Wittmann, R.
- Parallel Computing, Vol. 40, Issue 7
Minimizing Communication in Numerical Linear Algebra
journal, July 2011
- Ballard, Grey; Demmel, James; Holtz, Olga
- SIAM Journal on Matrix Analysis and Applications, Vol. 32, Issue 3
Collective communication: theory, practice, and experience
journal, January 2007
- Chan, Ernie; Heimlich, Marcel; Purkayastha, Avi
- Concurrency and Computation: Practice and Experience, Vol. 19, Issue 13
Elemental: A New Framework for Distributed Memory Dense Matrix Computations
journal, February 2013
- Poulson, Jack; Marker, Bryan; van de Geijn, Robert A.
- ACM Transactions on Mathematical Software, Vol. 39, Issue 2
Communication-optimal Parallel and Sequential QR and LU Factorizations
journal, January 2012
- Demmel, James; Grigori, Laura; Hoemmen, Mark
- SIAM Journal on Scientific Computing, Vol. 34, Issue 1
Hierarchical QR factorization algorithms for multi-core clusters
journal, April 2013
- Dongarra, Jack; Faverge, Mathieu; Hérault, Thomas
- Parallel Computing, Vol. 39, Issue 4-5
Broadcast Time in Communication Networks
journal, October 1980
- Farley, Arthur M.
- SIAM Journal on Applied Mathematics, Vol. 39, Issue 2
The WY Representation for Products of Householder Matrices
journal, January 1987
- Bischof, Christian; Van Loan, Charles
- SIAM Journal on Scientific and Statistical Computing, Vol. 8, Issue 1
A Storage-Efficient $WY$ Representation for Products of Householder Transformations
journal, January 1989
- Schreiber, Robert; Van Loan, Charles
- SIAM Journal on Scientific and Statistical Computing, Vol. 10, Issue 1
Backward error analysis of the AllReduce algorithm for householder QR decomposition
journal, December 2011
- Mori, Daisuke; Yamamoto, Yusaku; Zhang, Shao-Liang
- Japan Journal of Industrial and Applied Mathematics, Vol. 29, Issue 1
Modification of the Householder Method Based on the Compact WY Representation
journal, May 1992
- Puglisi, Chiara
- SIAM Journal on Scientific and Statistical Computing, Vol. 13, Issue 3
Block Reflectors: Theory and Computation
journal, February 1988
- Schreiber, Robert; Parlett, Beresford
- SIAM Journal on Numerical Analysis, Vol. 25, Issue 1
A Basis-Kernel Representation of Orthogonal Matrices
journal, October 1995
- Sun, Xiaobai; Bischof, Christian
- SIAM Journal on Matrix Analysis and Applications, Vol. 16, Issue 4
Optimization of Collective Communication Operations in MPICH
journal, February 2005
- Thakur, Rajeev; Rabenseifner, Rolf; Gropp, William
- The International Journal of High Performance Computing Applications, Vol. 19, Issue 1
Communication-efficient parallel generic pairwise elimination
journal, February 2007
- Tiskin, Alexander
- Future Generation Computer Systems, Vol. 23, Issue 2
Works referencing / citing this record:
Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors
journal, January 2020
- Tomás, Andrés E.; Quintana-Ortí, Enrique S.
- The Journal of Supercomputing, Vol. 76, Issue 11
Look-ahead in the two-sided reduction to compact band forms for symmetric eigenvalue problems and the SVD
journal, March 2018
- Rodríguez-Sánchez, Rafael; Catalán, Sandra; Herrero, José R.
- Numerical Algorithms, Vol. 80, Issue 2
Scaling Up Parallel Computation of Tiled QR Factorizations by a Distributed Scheduling Runtime System and Analytical Modeling
journal, March 2018
- Zheng, Weijian; Song, Fengguang; Lin, Lan
- Parallel Processing Letters, Vol. 28, Issue 01
Numerical algorithms for high-performance computational science
journal, January 2020
- Dongarra, Jack; Grigori, Laura; Higham, Nicholas J.
- Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166
A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem
conference, July 2017
- Solomonik, Edgar; Ballard, Grey; Demmel, James
- Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures
Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies
preprint, January 2016
- Gittens, Alex; Devarakonda, Aditya; Racah, Evan
- arXiv