Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Unbalanced Parallel I/O: An Often-Neglected Side Effect of Lossy Scientific Data Compression

Conference ·

Lossy compression techniques have demonstrated promising results in significantly reducing the scientific data size while guaranteeing the compression error bounds. However, one important yet often neglected side effect of lossy scientific data compression is its impact on the performance of parallel I/O. Our key observation is that the compressed data size is often highly skewed across processes in lossy scientific compression. To understand this behavior, we conduct extensive experiments where we apply three lossy compressors MGARD, ZFP, and SZ, which are specifically designed and optimized for scientific data, to three real-world scientific applications Gray-Scott simulation, WarpX, and XGC. Our analysis result demonstrates that the size of the compressed data is always skewed even if the original data is evenly decomposed among processes. Such skewness widely exists in different scientific applications using different compressors as long as the information density of the data varies across processes. We then systematically study how this side effect of lossy scientific data compression impacts the performance of parallel I/O. We observe that the skewness in the sizes of the compressed data often leads to I/O imbalance, which can significantly reduce the efficiency of I/O bandwidth utilization if not properly handled. In addition, writing data concurrently to a single shared file through MPI-IO library is more sensitive to the unbalanced I/O loads. Therefore, we believe our research community should pay more attention to the unbalanced parallel I/O caused by lossy scientific data compression.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21); USDOE Office of Science (SC)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1883783
Country of Publication:
United States
Language:
English

References (18)

A Method for the Construction of Minimum-Redundancy Codes journal September 1952
ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management journal July 2020
Spontaneous rotation sources in a quiescent tokamak edge plasma journal June 2008
ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization
  • Schendel, Eric R.; Klasky, Scott; Ross, Robert
  • Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing - HPDC '12 https://doi.org/10.1145/2287076.2287086
conference January 2012
Multilevel Techniques for Compression and Reduction of Scientific Data---The Unstructured Case journal January 2020
Multilevel techniques for compression and reduction of scientific data—the univariate case journal November 2018
Compressed ion temperature gradient turbulence in diverted tokamak edge journal May 2009
Improving I/O Performance with Adaptive Data Compression for Big Data Applications conference May 2014
Improving I/O Forwarding Throughput with Data Compression conference September 2011
Use cases of lossy compression for floating-point data in scientific data sets journal May 2019
Handbook of Data Compression book January 2010
Fixed-Rate Compressed Floating-Point Arrays journal December 2014
Multilevel Techniques for Compression and Reduction of Scientific Data---The Multivariate Case journal January 2019
Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets conference December 2018
Multilevel Techniques for Compression and Reduction of Scientific Data-Quantitative Control of Accuracy in Derived Quantities journal January 2019
Complex Patterns in a Simple System journal July 1993
A universal algorithm for sequential data compression journal May 1977
Warp-X: A new exascale computing platform for beam–plasma simulations
  • Vay, J. -L.; Almgren, A.; Bell, J.
  • Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 909 https://doi.org/10.1016/j.nima.2018.01.035
journal November 2018

Similar Records

zPerf: A Statistical Gray-Box Approach to Performance Modeling and Extrapolation for Scientific Lossy Compression
Journal Article · Wed Mar 15 00:00:00 EDT 2023 · IEEE Transactions on Computers · OSTI ID:2424046

Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation
Conference · Thu Dec 31 23:00:00 EST 2020 · OSTI ID:1864145

Maintaining Trust in Reduction: Preserving the Accuracy of Quantities of Interest for Lossy Compression
Conference · Mon Feb 28 23:00:00 EST 2022 · OSTI ID:1855632

Related Subjects