Simulated Half-Precision Implementation of Blocked QR Factorization and Graph Clustering Applications
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
We explored half-precision implementation of blocked QR algorithms with the following motivations: 1. New GPUs perform fast half-precision arithmetic (4 to 16 times as fast as doubleprecision). 2. QR factorization is a basic linear algebra tool useful for many physics and data analysis applications. 3. Communication-avoiding, parallelizable QR algorithms already exist for talland- skinny matrices. While the standard QR algorithms are highly unstable in half-precision, our numerical simulations show that the Tall-and-Skinny QR (TSQR) algorithm can improve the backward error of QR factorization. When using subspace iteration for graph clustering applications, half-precision accuracy in forming the eigenspace is sufficient for clustering with high precision and recall for some medium-scale benchmark problems. Note that all half-precision arithmetic were simulated with conversions to single-precision floats. Therefore, the results from this work are somewhat optimistic.
- Research Organization:
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA)
- DOE Contract Number:
- AC52-07NA27344
- OSTI ID:
- 1466174
- Report Number(s):
- LLNL-TR--756282; 943829
- Country of Publication:
- United States
- Language:
- English
Similar Records
A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.
Rounding Error Analysis of Mixed Precision Block Householder QR Algorithms