DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Software for Sparse Tensor Decomposition on Emerging Computing Architectures

Abstract

In this paper, we develop software for decomposing sparse tensors that is portable to and performant on a variety of multicore, manycore, and GPU computing architectures. The result is a single code whose performance matches optimized architecture-specific implementations. The key to a portable approach is to determine multiple levels of parallelism that can be mapped in different ways to different architectures, and we explain how to do this for the matricized tensor times Khatri--Rao product (MTTKRP), which is the key kernel in canonical polyadic tensor decomposition. Our implementation leverages the Kokkos framework, which enables a single code to achieve high performance across multiple architectures that differ in how they approach fine-grained parallelism. We also introduce a new construct for portable thread-local arrays, which we call compile-time polymorphic arrays. Not only are the specifics of our approaches and implementation interesting for tuning tensor computations, but they also provide a roadmap for developing other portable high-performance codes. As a last step in optimizing performance, we modify the MTTKRP algorithm itself to do a permuted traversal of tensor nonzeros to reduce atomic-write contention. Lastly, we test the performance of our implementation on 16- and 68-core Intel CPUs and the K80 and P100 NVIDIAmore » GPUs, showing that we are competitive with state-of-the-art architecture-specific codes while having the advantage of being able to run on a variety of architectures.« less

Authors:
 [1]; ORCiD logo [2]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  2. Sandia National Lab. (SNL-CA), Livermore, CA (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sandia National Lab. (SNL-CA), Livermore, CA (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1542146
Report Number(s):
SAND-2019-7264J
Journal ID: ISSN 1064-8275; 676825
Grant/Contract Number:  
AC04-94AL85000
Resource Type:
Accepted Manuscript
Journal Name:
SIAM Journal on Scientific Computing
Additional Journal Information:
Journal Volume: 41; Journal Issue: 3; Journal ID: ISSN 1064-8275
Publisher:
SIAM
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; tensor decomposition; canonical polyadic (CP); MTTKRP; Kokkos; manycore; GPU

Citation Formats

Phipps, Eric T., and Kolda, Tamara G. Software for Sparse Tensor Decomposition on Emerging Computing Architectures. United States: N. p., 2019. Web. doi:10.1137/18M1210691.
Phipps, Eric T., & Kolda, Tamara G. Software for Sparse Tensor Decomposition on Emerging Computing Architectures. United States. https://doi.org/10.1137/18M1210691
Phipps, Eric T., and Kolda, Tamara G. Thu . "Software for Sparse Tensor Decomposition on Emerging Computing Architectures". United States. https://doi.org/10.1137/18M1210691. https://www.osti.gov/servlets/purl/1542146.
@article{osti_1542146,
title = {Software for Sparse Tensor Decomposition on Emerging Computing Architectures},
author = {Phipps, Eric T. and Kolda, Tamara G.},
abstractNote = {In this paper, we develop software for decomposing sparse tensors that is portable to and performant on a variety of multicore, manycore, and GPU computing architectures. The result is a single code whose performance matches optimized architecture-specific implementations. The key to a portable approach is to determine multiple levels of parallelism that can be mapped in different ways to different architectures, and we explain how to do this for the matricized tensor times Khatri--Rao product (MTTKRP), which is the key kernel in canonical polyadic tensor decomposition. Our implementation leverages the Kokkos framework, which enables a single code to achieve high performance across multiple architectures that differ in how they approach fine-grained parallelism. We also introduce a new construct for portable thread-local arrays, which we call compile-time polymorphic arrays. Not only are the specifics of our approaches and implementation interesting for tuning tensor computations, but they also provide a roadmap for developing other portable high-performance codes. As a last step in optimizing performance, we modify the MTTKRP algorithm itself to do a permuted traversal of tensor nonzeros to reduce atomic-write contention. Lastly, we test the performance of our implementation on 16- and 68-core Intel CPUs and the K80 and P100 NVIDIA GPUs, showing that we are competitive with state-of-the-art architecture-specific codes while having the advantage of being able to run on a variety of architectures.},
doi = {10.1137/18M1210691},
journal = {SIAM Journal on Scientific Computing},
number = 3,
volume = 41,
place = {United States},
year = {Thu Jun 20 00:00:00 EDT 2019},
month = {Thu Jun 20 00:00:00 EDT 2019}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 11 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Algorithm 862: MATLAB tensor classes for fast algorithm prototyping
journal, December 2006

  • Bader, Brett W.; Kolda, Tamara G.
  • ACM Transactions on Mathematical Software, Vol. 32, Issue 4
  • DOI: 10.1145/1186785.1186794

Efficient MATLAB Computations with Sparse and Factored Tensors
journal, January 2008

  • Bader, Brett W.; Kolda, Tamara G.
  • SIAM Journal on Scientific Computing, Vol. 30, Issue 1
  • DOI: 10.1137/060676489

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
journal, December 2014

  • Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel
  • Journal of Parallel and Distributed Computing, Vol. 74, Issue 12
  • DOI: 10.1016/j.jpdc.2014.07.003

Tensor-based anomaly detection: An interdisciplinary survey
journal, April 2016


Parallel Candecomp/Parafac Decomposition of Sparse Tensors Using Dimension Trees
journal, January 2018

  • Kaya, Oguz; Uçar, Bora
  • SIAM Journal on Scientific Computing, Vol. 40, Issue 1
  • DOI: 10.1137/16M1102744

Tensor Decompositions and Applications
journal, August 2009

  • Kolda, Tamara G.; Bader, Brett W.
  • SIAM Review, Vol. 51, Issue 3
  • DOI: 10.1137/07070111X

Vc: A C++ library for explicit vectorization: VC: A C++ LIBRARY FOR EXPLICIT VECTORIZATION
journal, December 2011

  • Kretz, Matthias; Lindenstruth, Volker
  • Software: Practice and Experience, Vol. 42, Issue 11
  • DOI: 10.1002/spe.1149

Embedded Ensemble Propagation for Improving Performance, Portability, and Scalability of Uncertainty Quantification on Emerging Computational Architectures
journal, January 2017

  • Phipps, E.; D'Elia, M.; Edwards, H. C.
  • SIAM Journal on Scientific Computing, Vol. 39, Issue 2
  • DOI: 10.1137/15M1044679

A massively parallel tensor contraction framework for coupled-cluster computations
journal, December 2014

  • Solomonik, Edgar; Matthews, Devin; Hammond, Jeff R.
  • Journal of Parallel and Distributed Computing, Vol. 74, Issue 12
  • DOI: 10.1016/j.jpdc.2014.06.002

Works referencing / citing this record:

swTensor: accelerating tensor decomposition on Sunway architecture
journal, November 2019

  • Zhong, Xiaogang; Yang, Hailong; Luan, Zhongzhi
  • CCF Transactions on High Performance Computing, Vol. 1, Issue 3-4
  • DOI: 10.1007/s42514-019-00017-5