Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.

Conference ·
OSTI ID:1032929

Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches for orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.

Research Organization:
Sandia National Laboratories
Sponsoring Organization:
USDOE
DOE Contract Number:
AC04-94AL85000
OSTI ID:
1032929
Report Number(s):
SAND2010-8173C
Country of Publication:
United States
Language:
English

Similar Records

Reconstructing householder vectors from Tall-Skinny QR
Journal Article · Wed Aug 05 00:00:00 EDT 2015 · Journal of Parallel and Distributed Computing · OSTI ID:1236219

Simulated Half-Precision Implementation of Blocked QR Factorization and Graph Clustering Applications
Technical Report · Wed Aug 08 00:00:00 EDT 2018 · OSTI ID:1466174

A parallel divide and conquer algorithm for the symmetric eigenvalue problem on distributed memory architectures
Journal Article · Thu Jul 01 00:00:00 EDT 1999 · SIAM Journal on Scientific Computing · OSTI ID:20005552