Load-balanced sparse MTTKRP on GPUs
- Ohio State University
- BATTELLE (PACIFIC NW LAB)
- Georgia Institute of Technology
Sparse matricized tensor times Khatri-Rao product (MT- TKRP) is one of the most computationally expensive kernels in sparse tensor computations. This work focuses on optimizing the MTTKRP for floating point operations, storage, and scalability. We begin by identifying the performance bottlenecks in directly extending the state-of-the-art CSF (compressed sparse fiber) formats from CPUs to GPUs. Our detailed analysis over the recently proposed formats shows that the lower bounds on storage and flop counts can vary significantly depending on the structure of the sparse tensor. To address this, we propose a load balanced, computation and storage-efficient scheme, HYB, which combines the best of COO (coordinate), CSF and CSL (compressed slice). With these enhancements, our GPU framework significantly out- performs the current formats on both CPU and GPU platforms.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1862916
- Report Number(s):
- PNNL-SA-138752
- Country of Publication:
- United States
- Language:
- English
Similar Records
Efficient and Effective Sparse Tensor Reordering
True Load Balancing for Matricized Tensor Times Khatri-Rao Product