skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Reconstructing householder vectors from Tall-Skinny QR

Journal Article · · Journal of Parallel and Distributed Computing
 [1];  [2];  [3];  [4];  [2];  [2]
  1. Sandia National Lab. (SNL-CA), Livermore, CA (United States)
  2. Univ. of California, Berkeley, CA (United States)
  3. INRIA Paris, Rocquencourt (France)
  4. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. Furthermore, we also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance.

Research Organization:
Sandia National Lab. (SNL-CA), Livermore, CA (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC04-94AL85000; AC02-05CH11231; SC0008700; SC0010200
OSTI ID:
1236219
Alternate ID(s):
OSTI ID: 1250173
Report Number(s):
SAND-2015-1977J; 579371
Journal Information:
Journal of Parallel and Distributed Computing, Vol. 85, Issue C; ISSN 0743-7315
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 10 works
Citation information provided by
Web of Science

References (15)

A blocked QR-decomposition for the parallel symmetric eigenvalue problem journal July 2014
Minimizing Communication in Numerical Linear Algebra journal July 2011
Collective communication: theory, practice, and experience journal January 2007
Elemental: A New Framework for Distributed Memory Dense Matrix Computations journal February 2013
Communication-optimal Parallel and Sequential QR and LU Factorizations journal January 2012
Hierarchical QR factorization algorithms for multi-core clusters journal April 2013
Broadcast Time in Communication Networks journal October 1980
The WY Representation for Products of Householder Matrices journal January 1987
A Storage-Efficient $WY$ Representation for Products of Householder Transformations journal January 1989
Backward error analysis of the AllReduce algorithm for householder QR decomposition journal December 2011
Modification of the Householder Method Based on the Compact WY Representation journal May 1992
Block Reflectors: Theory and Computation journal February 1988
A Basis-Kernel Representation of Orthogonal Matrices journal October 1995
Optimization of Collective Communication Operations in MPICH journal February 2005
Communication-efficient parallel generic pairwise elimination journal February 2007

Cited By (6)

Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors journal January 2020
Look-ahead in the two-sided reduction to compact band forms for symmetric eigenvalue problems and the SVD journal March 2018
Scaling Up Parallel Computation of Tiled QR Factorizations by a Distributed Scheduling Runtime System and Analytical Modeling journal March 2018
Numerical algorithms for high-performance computational science
  • Dongarra, Jack; Grigori, Laura; Higham, Nicholas J.
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166 https://doi.org/10.1098/rsta.2019.0066
journal January 2020
A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem conference July 2017
Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies preprint January 2016