%0Computer Program %TTuckerCompressMPI v. 1.0 %XAs parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five-way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed-memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. This software provides a method for compressing large-scale multiway data. %AAustin, Woody %AKlinvex, Alicia %ABallard, Grey %AKolda, Tamara %Rhttps://doi.org/10.11578/dc.20201001.6 %Uhttps://www.osti.gov/doecode/biblio/45231 %CUnited States %D2016 %GEnglish %2USDOE %1AC04-94AL85000 2016-09-21