DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Reconstructing householder vectors from Tall-Skinny QR

Abstract

The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. Furthermore, we also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees onmore » numerical stability in some cases in order to obtain higher performance.« less

Authors:
 [1];  [2];  [3];  [4];  [2];  [2]
  1. Sandia National Lab. (SNL-CA), Livermore, CA (United States)
  2. Univ. of California, Berkeley, CA (United States)
  3. INRIA Paris, Rocquencourt (France)
  4. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-CA), Livermore, CA (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1236219
Alternate Identifier(s):
OSTI ID: 1250173
Report Number(s):
SAND-2015-1977J
Journal ID: ISSN 0743-7315; 579371
Grant/Contract Number:  
AC04-94AL85000; AC02-05CH11231; SC0008700; SC0010200
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Parallel and Distributed Computing
Additional Journal Information:
Journal Volume: 85; Journal Issue: C; Journal ID: ISSN 0743-7315
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; QR decomposition; dense linear algebra; communication-avoiding algorithms

Citation Formats

Ballard, Grey Malone, Demmel, James, Grigori, Laura, Jacquelin, Mathias, Knight, Nicholas, and Nguyen, Hong Diep. Reconstructing householder vectors from Tall-Skinny QR. United States: N. p., 2015. Web. doi:10.1016/j.jpdc.2015.06.003.
Ballard, Grey Malone, Demmel, James, Grigori, Laura, Jacquelin, Mathias, Knight, Nicholas, & Nguyen, Hong Diep. Reconstructing householder vectors from Tall-Skinny QR. United States. https://doi.org/10.1016/j.jpdc.2015.06.003
Ballard, Grey Malone, Demmel, James, Grigori, Laura, Jacquelin, Mathias, Knight, Nicholas, and Nguyen, Hong Diep. Wed . "Reconstructing householder vectors from Tall-Skinny QR". United States. https://doi.org/10.1016/j.jpdc.2015.06.003. https://www.osti.gov/servlets/purl/1236219.
@article{osti_1236219,
title = {Reconstructing householder vectors from Tall-Skinny QR},
author = {Ballard, Grey Malone and Demmel, James and Grigori, Laura and Jacquelin, Mathias and Knight, Nicholas and Nguyen, Hong Diep},
abstractNote = {The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. Furthermore, we also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance.},
doi = {10.1016/j.jpdc.2015.06.003},
journal = {Journal of Parallel and Distributed Computing},
number = C,
volume = 85,
place = {United States},
year = {Wed Aug 05 00:00:00 EDT 2015},
month = {Wed Aug 05 00:00:00 EDT 2015}
}

Journal Article:

Citation Metrics:
Cited by: 10 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

A blocked QR-decomposition for the parallel symmetric eigenvalue problem
journal, July 2014


Minimizing Communication in Numerical Linear Algebra
journal, July 2011

  • Ballard, Grey; Demmel, James; Holtz, Olga
  • SIAM Journal on Matrix Analysis and Applications, Vol. 32, Issue 3
  • DOI: 10.1137/090769156

Collective communication: theory, practice, and experience
journal, January 2007

  • Chan, Ernie; Heimlich, Marcel; Purkayastha, Avi
  • Concurrency and Computation: Practice and Experience, Vol. 19, Issue 13
  • DOI: 10.1002/cpe.1206

Elemental: A New Framework for Distributed Memory Dense Matrix Computations
journal, February 2013

  • Poulson, Jack; Marker, Bryan; van de Geijn, Robert A.
  • ACM Transactions on Mathematical Software, Vol. 39, Issue 2
  • DOI: 10.1145/2427023.2427030

Communication-optimal Parallel and Sequential QR and LU Factorizations
journal, January 2012

  • Demmel, James; Grigori, Laura; Hoemmen, Mark
  • SIAM Journal on Scientific Computing, Vol. 34, Issue 1
  • DOI: 10.1137/080731992

Hierarchical QR factorization algorithms for multi-core clusters
journal, April 2013


Broadcast Time in Communication Networks
journal, October 1980

  • Farley, Arthur M.
  • SIAM Journal on Applied Mathematics, Vol. 39, Issue 2
  • DOI: 10.1137/0139032

The WY Representation for Products of Householder Matrices
journal, January 1987

  • Bischof, Christian; Van Loan, Charles
  • SIAM Journal on Scientific and Statistical Computing, Vol. 8, Issue 1
  • DOI: 10.1137/0908009

A Storage-Efficient $WY$ Representation for Products of Householder Transformations
journal, January 1989

  • Schreiber, Robert; Van Loan, Charles
  • SIAM Journal on Scientific and Statistical Computing, Vol. 10, Issue 1
  • DOI: 10.1137/0910005

Backward error analysis of the AllReduce algorithm for householder QR decomposition
journal, December 2011

  • Mori, Daisuke; Yamamoto, Yusaku; Zhang, Shao-Liang
  • Japan Journal of Industrial and Applied Mathematics, Vol. 29, Issue 1
  • DOI: 10.1007/s13160-011-0053-x

Modification of the Householder Method Based on the Compact WY Representation
journal, May 1992

  • Puglisi, Chiara
  • SIAM Journal on Scientific and Statistical Computing, Vol. 13, Issue 3
  • DOI: 10.1137/0913042

Block Reflectors: Theory and Computation
journal, February 1988

  • Schreiber, Robert; Parlett, Beresford
  • SIAM Journal on Numerical Analysis, Vol. 25, Issue 1
  • DOI: 10.1137/0725014

A Basis-Kernel Representation of Orthogonal Matrices
journal, October 1995

  • Sun, Xiaobai; Bischof, Christian
  • SIAM Journal on Matrix Analysis and Applications, Vol. 16, Issue 4
  • DOI: 10.1137/S0895479894276369

Optimization of Collective Communication Operations in MPICH
journal, February 2005

  • Thakur, Rajeev; Rabenseifner, Rolf; Gropp, William
  • The International Journal of High Performance Computing Applications, Vol. 19, Issue 1
  • DOI: 10.1177/1094342005051521

Communication-efficient parallel generic pairwise elimination
journal, February 2007


Works referencing / citing this record:

Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors
journal, January 2020


Look-ahead in the two-sided reduction to compact band forms for symmetric eigenvalue problems and the SVD
journal, March 2018

  • Rodríguez-Sánchez, Rafael; Catalán, Sandra; Herrero, José R.
  • Numerical Algorithms, Vol. 80, Issue 2
  • DOI: 10.1007/s11075-018-0500-8

Scaling Up Parallel Computation of Tiled QR Factorizations by a Distributed Scheduling Runtime System and Analytical Modeling
journal, March 2018


Numerical algorithms for high-performance computational science
journal, January 2020

  • Dongarra, Jack; Grigori, Laura; Higham, Nicholas J.
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166
  • DOI: 10.1098/rsta.2019.0066

A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem
conference, July 2017

  • Solomonik, Edgar; Ballard, Grey; Demmel, James
  • Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures
  • DOI: 10.1145/3087556.3087561