skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A 3D Parallel Algorithm for QR Decomposition

Abstract

Interprocessor communication often dominates the runtime of large matrix computations. Here, we present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing its latency cost (number of messages). By varying a parameter to navigate the bandwidth/latency tradeoff, we can tune this algorithm for machines with different communication costs.

Authors:
 [1];  [2];  [3];  [4];  [5]
  1. Wake Forest Univ., Winston Salem, NC (United States)
  2. Univ. of California, Berkeley, CA (United States)
  3. INRIA Paris-Rocquencourt, Paris (France)
  4. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  5. New York Univ. (NYU), NY (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC); National Science Foundation (NSF); European Union (EU)
OSTI Identifier:
1525281
Grant/Contract Number:  
AC02-05CH11231; ACI-1642385; 671633
Resource Type:
Accepted Manuscript
Journal Name:
Annual ACM Symposium on Parallelism in Algorithms and Architectures
Additional Journal Information:
Journal Volume: 2018; Conference: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA '18, Vienna (Austria), 16-18 July 2018; Journal ID: ISSN 1548-6109
Publisher:
Association for Computing Machinery (ACM)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Ballard, Grey, Demmel, James, Grigori, Laura, Jacquelin, Mathias, and Knight, Nicholas. A 3D Parallel Algorithm for QR Decomposition. United States: N. p., 2018. Web. doi:10.1145/3210377.3210415.
Ballard, Grey, Demmel, James, Grigori, Laura, Jacquelin, Mathias, & Knight, Nicholas. A 3D Parallel Algorithm for QR Decomposition. United States. doi:https://doi.org/10.1145/3210377.3210415
Ballard, Grey, Demmel, James, Grigori, Laura, Jacquelin, Mathias, and Knight, Nicholas. Wed . "A 3D Parallel Algorithm for QR Decomposition". United States. doi:https://doi.org/10.1145/3210377.3210415. https://www.osti.gov/servlets/purl/1525281.
@article{osti_1525281,
title = {A 3D Parallel Algorithm for QR Decomposition},
author = {Ballard, Grey and Demmel, James and Grigori, Laura and Jacquelin, Mathias and Knight, Nicholas},
abstractNote = {Interprocessor communication often dominates the runtime of large matrix computations. Here, we present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing its latency cost (number of messages). By varying a parameter to navigate the bandwidth/latency tradeoff, we can tune this algorithm for machines with different communication costs.},
doi = {10.1145/3210377.3210415},
journal = {Annual ACM Symposium on Parallelism in Algorithms and Architectures},
number = ,
volume = 2018,
place = {United States},
year = {2018},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Modification of the Householder Method Based on the Compact WY Representation
journal, May 1992

  • Puglisi, Chiara
  • SIAM Journal on Scientific and Statistical Computing, Vol. 13, Issue 3
  • DOI: 10.1137/0913042

Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations
conference, May 2017

  • Wicky, Tobias; Solomonik, Edgar; Hoefler, Torsten
  • 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2017.104

Parallel Matrix Multiplication: A Systematic Journey
journal, January 2016

  • Schatz, Martin D.; van de Geijn, Robert A.; Poulson, Jack
  • SIAM Journal on Scientific Computing, Vol. 38, Issue 6
  • DOI: 10.1137/140993478

Efficient algorithms for all-to-all communications in multiport message-passing systems
journal, January 1997

  • Bruck, J.; Kipnis, S.
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 8, Issue 11
  • DOI: 10.1109/71.642949

A Communication-Time Tradeoff
journal, August 1987

  • Papadimitriou, Christos H.; Ullman, Jeffrey D.
  • SIAM Journal on Computing, Vol. 16, Issue 4
  • DOI: 10.1137/0216044

A three-dimensional approach to parallel matrix multiplication
journal, September 1995

  • Agarwal, R. C.; Balle, S. M.; Gustavson, F. G.
  • IBM Journal of Research and Development, Vol. 39, Issue 5
  • DOI: 10.1147/rd.395.0575

Communication lower bounds and optimal algorithms for numerical linear algebra
journal, May 2014


Reconstructing Householder vectors from Tall-Skinny QR
journal, November 2015


Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication
journal, December 2016

  • Ballard, Grey; Druinsky, Alex; Knight, Nicholas
  • ACM Transactions on Parallel Computing, Vol. 3, Issue 3
  • DOI: 10.1145/3015144

Tall and skinny QR factorizations in MapReduce architectures
conference, January 2011

  • Constantine, Paul G.; Gleich, David F.
  • Proceedings of the second international workshop on MapReduce and its applications - MapReduce '11
  • DOI: 10.1145/1996092.1996103

Communication-optimal Parallel and Sequential QR and LU Factorizations
journal, January 2012

  • Demmel, James; Grigori, Laura; Hoemmen, Mark
  • SIAM Journal on Scientific Computing, Vol. 34, Issue 1
  • DOI: 10.1137/080731992

Applying recursion to serial and parallel QR factorization leads to better performance
journal, July 2000

  • Elmroth, E.; Gustavson, F. G.
  • IBM Journal of Research and Development, Vol. 44, Issue 4
  • DOI: 10.1147/rd.444.0605

Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software
journal, January 2004


Unitary Triangularization of a Nonsymmetric Matrix
journal, October 1958


Trade-Offs Between Synchronization, Communication, and Computation in Parallel Linear Algebra Computations
journal, June 2016

  • Solomonik, Edgar; Carson, Erin; Knight, Nicholas
  • ACM Transactions on Parallel Computing, Vol. 3, Issue 1
  • DOI: 10.1145/2897188

Optimization of Collective Communication Operations in MPICH
journal, February 2005

  • Thakur, Rajeev; Rabenseifner, Rolf; Gropp, William
  • The International Journal of High Performance Computing Applications, Vol. 19, Issue 1
  • DOI: 10.1177/1094342005051521

Communication-efficient parallel generic pairwise elimination
journal, February 2007


A bridging model for parallel computation
journal, August 1990


Improving the Performance of CA-GMRES on Multicores with Multiple GPUs
conference, May 2014

  • Yamazaki, Ichitaro; Anzt, Hartwig; Tomov, Stanimire
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2014.48