A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.

Hoemmen, Mark

Title: A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.

Conference · Mon Nov 01 00:00:00 EDT 2010

OSTI ID:1032929

Hoemmen, Mark

Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches for orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC04-94AL85000

OSTI ID:: 1032929

Report Number(s):: SAND2010-8173C; TRN: US201202%%482

Resource Relation:: Conference: Proposed for presentation at the 25th IEEE International Parallel & Distributed Processing Symposium held May 16-20, 2011 in Anchorage, AK.

Country of Publication:: United States

Language:: English

Similar Records

Reconstructing householder vectors from Tall-Skinny QR

Journal Article · Wed Aug 05 00:00:00 EDT 2015 · Journal of Parallel and Distributed Computing · OSTI ID:1032929

Ballard, Grey Malone; Demmel, James; Grigori, Laura; +3 more

Simulated Half-Precision Implementation of Blocked QR Factorization and Graph Clustering Applications

Technical Report · Wed Aug 08 00:00:00 EDT 2018 · OSTI ID:1032929

Yang, Lucia Minah; Sanders, Geoffrey D.

Quantum Monte Carlo Endstation for Petascale Computing

Technical Report · Wed Mar 02 00:00:00 EST 2011 · OSTI ID:1032929

Ceperley, David

Related Subjects

99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
ACCURACY
ALGORITHMS
DESIGN
EIGENVALUES
FACTORIZATION
IMPLEMENTATION
ITERATIVE METHODS
PERFORMANCE
PROCESSING
PROGRAMMING
VECTORS

Title: A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.

Citation Formats

Similar Records

Related Subjects