TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition
Abstract
With this study, our goal is compression of massive-scale grid-structured data, such as the multi-terabyte output of a high-fidelity computational simulation. For such data sets, we have developed a new software package called TuckerMPI, a parallel C++/MPI software package for compressing distributed data. The approach is based on treating the data as a tensor, i.e., a multidimensional array, and computing its truncated Tucker decomposition, a higher-order analogue to the truncated singular value decomposition of a matrix. The result is a low-rank approximation of the original tensor-structured data. Compression efficiency is achieved by detecting latent global structure within the data, which we contrast to most compression methods that are focused on local structure. In this work, we describe TuckerMPI, our implementation of the truncated Tucker decomposition, including details of the data distribution and in-memory layouts, the parallel and serial implementations of the key kernels, and analysis of the storage, communication, and computational costs. We test the software on 4.5 and 6.7 terabyte data sets distributed across 100 s of nodes (1,000 s of MPI processes), achieving compression ratios between 100 and 200,000×, which equates to 99--99.999% compression (depending on the desired accuracy) in substantially less time than it would take tomore »
- Authors:
-
- Wake Forest University, Winston-Salem, NC (United States)
- Sandia National Lab. (SNL-CA), Livermore, CA (United States)
- Publication Date:
- Research Org.:
- Sandia National Lab. (SNL-CA), Livermore, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE National Nuclear Security Administration (NNSA); National Science Foundation (NSF)
- OSTI Identifier:
- 1639093
- Report Number(s):
- SAND-2020-6977J
Journal ID: ISSN 0098-3500; 687210
- Grant/Contract Number:
- AC04-94AL85000; OAC-1642385; NA0003525
- Resource Type:
- Accepted Manuscript
- Journal Name:
- ACM Transactions on Mathematical Software
- Additional Journal Information:
- Journal Volume: 46; Journal Issue: 2; Journal ID: ISSN 0098-3500
- Publisher:
- Association for Computing Machinery
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Mathematics of computing; mathematical software performance; computations on matrices
Citation Formats
Ballard, Grey, Klinvex, Alicia, and Kolda, Tamara G. TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition. United States: N. p., 2020.
Web. doi:10.1145/3378445.
Ballard, Grey, Klinvex, Alicia, & Kolda, Tamara G. TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition. United States. https://doi.org/10.1145/3378445
Ballard, Grey, Klinvex, Alicia, and Kolda, Tamara G. Thu .
"TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition". United States. https://doi.org/10.1145/3378445. https://www.osti.gov/servlets/purl/1639093.
@article{osti_1639093,
title = {TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition},
author = {Ballard, Grey and Klinvex, Alicia and Kolda, Tamara G.},
abstractNote = {With this study, our goal is compression of massive-scale grid-structured data, such as the multi-terabyte output of a high-fidelity computational simulation. For such data sets, we have developed a new software package called TuckerMPI, a parallel C++/MPI software package for compressing distributed data. The approach is based on treating the data as a tensor, i.e., a multidimensional array, and computing its truncated Tucker decomposition, a higher-order analogue to the truncated singular value decomposition of a matrix. The result is a low-rank approximation of the original tensor-structured data. Compression efficiency is achieved by detecting latent global structure within the data, which we contrast to most compression methods that are focused on local structure. In this work, we describe TuckerMPI, our implementation of the truncated Tucker decomposition, including details of the data distribution and in-memory layouts, the parallel and serial implementations of the key kernels, and analysis of the storage, communication, and computational costs. We test the software on 4.5 and 6.7 terabyte data sets distributed across 100 s of nodes (1,000 s of MPI processes), achieving compression ratios between 100 and 200,000×, which equates to 99--99.999% compression (depending on the desired accuracy) in substantially less time than it would take to even read the same dataset from a parallel file system. Moreover, we show that our method also allows for reconstruction of partial or down-sampled data on a single node, without a parallel computer so long as the reconstructed portion is small enough to fit on a single machine, e.g., in the instance of reconstructing/visualizing a single down-sampled time step or computing summary statistics. The code is available at https://gitlab.com/tensors/TuckerMPI.},
doi = {10.1145/3378445},
journal = {ACM Transactions on Mathematical Software},
number = 2,
volume = 46,
place = {United States},
year = {Thu Jun 11 00:00:00 EDT 2020},
month = {Thu Jun 11 00:00:00 EDT 2020}
}
Works referenced in this record:
Analysis and compression of six-dimensional gyrokinetic datasets using higher order singular value decomposition
journal, June 2012
- Hatch, D. R.; del-Castillo-Negrete, D.; Terry, P. W.
- Journal of Computational Physics, Vol. 231, Issue 11
Terascale direct numerical simulations of turbulent combustion using S3D
journal, January 2009
- Chen, J. H.; Choudhary, A.; de Supinski, B.
- Computational Science & Discovery, Vol. 2, Issue 1
Fixed-Rate Compressed Floating-Point Arrays
journal, December 2014
- Lindstrom, Peter
- IEEE Transactions on Visualization and Computer Graphics, Vol. 20, Issue 12
High-Performance Dense Tucker Decomposition on GPU Clusters
conference, November 2018
- Choi, Jee; Liu, Xing; Chakaravarthy, Venkatesan
- SC18: International Conference for High Performance Computing, Networking, Storage and Analysis
Data reduction method for droplet deformation experiments based on High Order Singular Value Decomposition
journal, December 2016
- García-Magariño, A.; Sor, S.; Velazquez, A.
- Experimental Thermal and Fluid Science, Vol. 79
Lossy volume compression using Tucker truncation and thresholding
journal, May 2015
- Ballester-Ripoll, Rafael; Pajarola, Renato
- The Visual Computer, Vol. 32, Issue 11
A Multilinear Singular Value Decomposition
journal, January 2000
- De Lathauwer, Lieven; De Moor, Bart; Vandewalle, Joos
- SIAM Journal on Matrix Analysis and Applications, Vol. 21, Issue 4
Accelerating the Tucker Decomposition with Compressed Sparse Tensors
book, January 2017
- Smith, Shaden; Karypis, George
- Lecture Notes in Computer Science
Some mathematical notes on three-mode factor analysis
journal, September 1966
- Tucker, Ledyard R.
- Psychometrika, Vol. 31, Issue 3
An input-adaptive and in-place approach to dense tensor-times-matrix multiply
conference, January 2015
- Li, Jiajia; Battaglino, Casey; Perros, Ioakeim
- Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
Structure of hydrogen-rich transverse jets in a vitiated turbulent flow
journal, April 2015
- Lyra, Sgouria; Wilde, Benjamin; Kolla, Hemanth
- Combustion and Flame, Vol. 162, Issue 4
Time-varying, multivariate volume data reduction
conference, January 2005
- Fout, Nathaniel; Ma, Kwan-Liu; Ahrens, James
- Proceedings of the 2005 ACM symposium on Applied computing - SAC '05
A New Truncation Strategy for the Higher-Order Singular Value Decomposition
journal, January 2012
- Vannieuwenhoven, Nick; Vandebril, Raf; Meerbergen, Karl
- SIAM Journal on Scientific Computing, Vol. 34, Issue 2
Optimization of Collective Communication Operations in MPICH
journal, February 2005
- Thakur, Rajeev; Rabenseifner, Rolf; Gropp, William
- The International Journal of High Performance Computing Applications, Vol. 19, Issue 1
Fast Alternating LS Algorithms for High Order CANDECOMP/PARAFAC Tensor Factorizations
journal, October 2013
- Phan, Anh-Huy; Tichavsky, Petr; Cichocki, Andrzej
- IEEE Transactions on Signal Processing, Vol. 61, Issue 19
Velocity and Reactive Scalar Dissipation Spectra in Turbulent Premixed Flames
journal, June 2016
- Kolla, Hemanth; Zhao, Xin-Yu; Chen, Jacqueline H.
- Combustion Science and Technology, Vol. 188, Issue 9
A New Truncation Strategy for the Higher-Order Singular Value Decomposition
journal, January 2012
- Vannieuwenhoven, Nick; Vandebril, Raf; Meerbergen, Karl
- SIAM Journal on Scientific Computing, Vol. 34, Issue 2
Time-varying, multivariate volume data reduction
conference, January 2005
- Fout, Nathaniel; Ma, Kwan-Liu; Ahrens, James
- Proceedings of the 2005 ACM symposium on Applied computing - SAC '05
Works referencing / citing this record:
Randomized Functional Sparse Tucker Tensor for Compression and Fast Visualization of Scientific Data
preprint, January 2019
- Rai, Prashant; Kolla, Hemanth; Cannada, Lewis
- arXiv