DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition

Abstract

With this study, our goal is compression of massive-scale grid-structured data, such as the multi-terabyte output of a high-fidelity computational simulation. For such data sets, we have developed a new software package called TuckerMPI, a parallel C++/MPI software package for compressing distributed data. The approach is based on treating the data as a tensor, i.e., a multidimensional array, and computing its truncated Tucker decomposition, a higher-order analogue to the truncated singular value decomposition of a matrix. The result is a low-rank approximation of the original tensor-structured data. Compression efficiency is achieved by detecting latent global structure within the data, which we contrast to most compression methods that are focused on local structure. In this work, we describe TuckerMPI, our implementation of the truncated Tucker decomposition, including details of the data distribution and in-memory layouts, the parallel and serial implementations of the key kernels, and analysis of the storage, communication, and computational costs. We test the software on 4.5 and 6.7 terabyte data sets distributed across 100 s of nodes (1,000 s of MPI processes), achieving compression ratios between 100 and 200,000×, which equates to 99--99.999% compression (depending on the desired accuracy) in substantially less time than it would take tomore » even read the same dataset from a parallel file system. Moreover, we show that our method also allows for reconstruction of partial or down-sampled data on a single node, without a parallel computer so long as the reconstructed portion is small enough to fit on a single machine, e.g., in the instance of reconstructing/visualizing a single down-sampled time step or computing summary statistics. The code is available at https://gitlab.com/tensors/TuckerMPI.« less

Authors:
 [1];  [2];  [2]
  1. Wake Forest University, Winston-Salem, NC (United States)
  2. Sandia National Lab. (SNL-CA), Livermore, CA (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-CA), Livermore, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE National Nuclear Security Administration (NNSA); National Science Foundation (NSF)
OSTI Identifier:
1639093
Report Number(s):
SAND-2020-6977J
Journal ID: ISSN 0098-3500; 687210
Grant/Contract Number:  
AC04-94AL85000; OAC-1642385; NA0003525
Resource Type:
Accepted Manuscript
Journal Name:
ACM Transactions on Mathematical Software
Additional Journal Information:
Journal Volume: 46; Journal Issue: 2; Journal ID: ISSN 0098-3500
Publisher:
Association for Computing Machinery
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Mathematics of computing; mathematical software performance; computations on matrices

Citation Formats

Ballard, Grey, Klinvex, Alicia, and Kolda, Tamara G. TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition. United States: N. p., 2020. Web. doi:10.1145/3378445.
Ballard, Grey, Klinvex, Alicia, & Kolda, Tamara G. TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition. United States. https://doi.org/10.1145/3378445
Ballard, Grey, Klinvex, Alicia, and Kolda, Tamara G. Thu . "TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition". United States. https://doi.org/10.1145/3378445. https://www.osti.gov/servlets/purl/1639093.
@article{osti_1639093,
title = {TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition},
author = {Ballard, Grey and Klinvex, Alicia and Kolda, Tamara G.},
abstractNote = {With this study, our goal is compression of massive-scale grid-structured data, such as the multi-terabyte output of a high-fidelity computational simulation. For such data sets, we have developed a new software package called TuckerMPI, a parallel C++/MPI software package for compressing distributed data. The approach is based on treating the data as a tensor, i.e., a multidimensional array, and computing its truncated Tucker decomposition, a higher-order analogue to the truncated singular value decomposition of a matrix. The result is a low-rank approximation of the original tensor-structured data. Compression efficiency is achieved by detecting latent global structure within the data, which we contrast to most compression methods that are focused on local structure. In this work, we describe TuckerMPI, our implementation of the truncated Tucker decomposition, including details of the data distribution and in-memory layouts, the parallel and serial implementations of the key kernels, and analysis of the storage, communication, and computational costs. We test the software on 4.5 and 6.7 terabyte data sets distributed across 100 s of nodes (1,000 s of MPI processes), achieving compression ratios between 100 and 200,000×, which equates to 99--99.999% compression (depending on the desired accuracy) in substantially less time than it would take to even read the same dataset from a parallel file system. Moreover, we show that our method also allows for reconstruction of partial or down-sampled data on a single node, without a parallel computer so long as the reconstructed portion is small enough to fit on a single machine, e.g., in the instance of reconstructing/visualizing a single down-sampled time step or computing summary statistics. The code is available at https://gitlab.com/tensors/TuckerMPI.},
doi = {10.1145/3378445},
journal = {ACM Transactions on Mathematical Software},
number = 2,
volume = 46,
place = {United States},
year = {Thu Jun 11 00:00:00 EDT 2020},
month = {Thu Jun 11 00:00:00 EDT 2020}
}

Works referenced in this record:

Analysis and compression of six-dimensional gyrokinetic datasets using higher order singular value decomposition
journal, June 2012

  • Hatch, D. R.; del-Castillo-Negrete, D.; Terry, P. W.
  • Journal of Computational Physics, Vol. 231, Issue 11
  • DOI: 10.1016/j.jcp.2012.02.007

Terascale direct numerical simulations of turbulent combustion using S3D
journal, January 2009


Fixed-Rate Compressed Floating-Point Arrays
journal, December 2014

  • Lindstrom, Peter
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 20, Issue 12
  • DOI: 10.1109/TVCG.2014.2346458

High-Performance Dense Tucker Decomposition on GPU Clusters
conference, November 2018

  • Choi, Jee; Liu, Xing; Chakaravarthy, Venkatesan
  • SC18: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2018.00045

Data reduction method for droplet deformation experiments based on High Order Singular Value Decomposition
journal, December 2016


Lossy volume compression using Tucker truncation and thresholding
journal, May 2015


A Multilinear Singular Value Decomposition
journal, January 2000

  • De Lathauwer, Lieven; De Moor, Bart; Vandewalle, Joos
  • SIAM Journal on Matrix Analysis and Applications, Vol. 21, Issue 4
  • DOI: 10.1137/S0895479896305696

Accelerating the Tucker Decomposition with Compressed Sparse Tensors
book, January 2017


Some mathematical notes on three-mode factor analysis
journal, September 1966


Numerical tensor calculus
journal, May 2014


An input-adaptive and in-place approach to dense tensor-times-matrix multiply
conference, January 2015

  • Li, Jiajia; Battaglino, Casey; Perros, Ioakeim
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
  • DOI: 10.1145/2807591.2807671

Structure of hydrogen-rich transverse jets in a vitiated turbulent flow
journal, April 2015


Time-varying, multivariate volume data reduction
conference, January 2005

  • Fout, Nathaniel; Ma, Kwan-Liu; Ahrens, James
  • Proceedings of the 2005 ACM symposium on Applied computing - SAC '05
  • DOI: 10.1145/1066677.1066953

A New Truncation Strategy for the Higher-Order Singular Value Decomposition
journal, January 2012

  • Vannieuwenhoven, Nick; Vandebril, Raf; Meerbergen, Karl
  • SIAM Journal on Scientific Computing, Vol. 34, Issue 2
  • DOI: 10.1137/110836067

Optimization of Collective Communication Operations in MPICH
journal, February 2005

  • Thakur, Rajeev; Rabenseifner, Rolf; Gropp, William
  • The International Journal of High Performance Computing Applications, Vol. 19, Issue 1
  • DOI: 10.1177/1094342005051521

Fast Alternating LS Algorithms for High Order CANDECOMP/PARAFAC Tensor Factorizations
journal, October 2013

  • Phan, Anh-Huy; Tichavsky, Petr; Cichocki, Andrzej
  • IEEE Transactions on Signal Processing, Vol. 61, Issue 19
  • DOI: 10.1109/TSP.2013.2269903

Velocity and Reactive Scalar Dissipation Spectra in Turbulent Premixed Flames
journal, June 2016


A New Truncation Strategy for the Higher-Order Singular Value Decomposition
journal, January 2012

  • Vannieuwenhoven, Nick; Vandebril, Raf; Meerbergen, Karl
  • SIAM Journal on Scientific Computing, Vol. 34, Issue 2
  • DOI: 10.1137/110836067

Time-varying, multivariate volume data reduction
conference, January 2005

  • Fout, Nathaniel; Ma, Kwan-Liu; Ahrens, James
  • Proceedings of the 2005 ACM symposium on Applied computing - SAC '05
  • DOI: 10.1145/1066677.1066953

Works referencing / citing this record:

Randomized Functional Sparse Tucker Tensor for Compression and Fast Visualization of Scientific Data
preprint, January 2019