DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: SIRIUS: Enabling Progressive Data Exploration for Extreme-Scale Scientific Data

Abstract

Scientific simulations on high performance computing (HPC) platforms generate large quantities of data. To bridge the widening gap between compute and I/O, and enable data to be more efficiently stored and analyzed, simulation outputs need to be refactored, reduced, and appropriately mapped to storage tiers. However, a systematic solution to support these steps has been lacking in the current HPC software ecosystem. To that end, this paper develops SIRIUS, a progressive JPEG-like data management scheme for storing and analyzing big scientific data. It co-designs data decimation, compression, and data storage, taking the hardware characteristics of each storage tier into considerations. With reasonably low overhead, our approach refactors simulation data, using either topological or uniform decimation, into a much smaller, reduced-accuracy base dataset, and a series of deltas that is used to augment the accuracy if needed. The base dataset and deltas are compressed and written to multiple storage tiers. Data saved on different tiers can then be selectively retrieved to restore the level of accuracy that satisfies data analytics. Thus, SIRIUS provides a paradigm shift towards elastic data analytics and enables end users to make trade-offs between analysis speed and accuracy on-the-fly. This paper further develops algorithms to preserve statisticsmore » for data decimation, a common requirement for reducing data. Here, we assess the impact of SIRIUS on unstructured triangular meshes, a pervasive data model used in scientific simulations. In particular, we evaluate two realistic use cases: the blob detection in fusion and high-pressure area extraction in computational fluid dynamics.« less

Authors:
ORCiD logo [1];  [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [2]; ORCiD logo [1]
  1. New Jersey Inst. of Technology, Newark, NJ (United States)
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1559600
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
IEEE Transactions on Multi-Scale Computing Systems
Additional Journal Information:
Journal Volume: 4; Journal Issue: 4; Journal ID: ISSN 2372-207X
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; high-performance computing; data analytics; storage; data reduction; compression; progressive refactoring

Citation Formats

Qiao, Zhenbo, Lu, Tao, Luo, Huizhang, Liu, Qing, Klasky, Scott A., Podhorszki, Norbert, and Wang, Jinzhen. SIRIUS: Enabling Progressive Data Exploration for Extreme-Scale Scientific Data. United States: N. p., 2018. Web. doi:10.1109/TMSCS.2018.2886851.
Qiao, Zhenbo, Lu, Tao, Luo, Huizhang, Liu, Qing, Klasky, Scott A., Podhorszki, Norbert, & Wang, Jinzhen. SIRIUS: Enabling Progressive Data Exploration for Extreme-Scale Scientific Data. United States. https://doi.org/10.1109/TMSCS.2018.2886851
Qiao, Zhenbo, Lu, Tao, Luo, Huizhang, Liu, Qing, Klasky, Scott A., Podhorszki, Norbert, and Wang, Jinzhen. Fri . "SIRIUS: Enabling Progressive Data Exploration for Extreme-Scale Scientific Data". United States. https://doi.org/10.1109/TMSCS.2018.2886851. https://www.osti.gov/servlets/purl/1559600.
@article{osti_1559600,
title = {SIRIUS: Enabling Progressive Data Exploration for Extreme-Scale Scientific Data},
author = {Qiao, Zhenbo and Lu, Tao and Luo, Huizhang and Liu, Qing and Klasky, Scott A. and Podhorszki, Norbert and Wang, Jinzhen},
abstractNote = {Scientific simulations on high performance computing (HPC) platforms generate large quantities of data. To bridge the widening gap between compute and I/O, and enable data to be more efficiently stored and analyzed, simulation outputs need to be refactored, reduced, and appropriately mapped to storage tiers. However, a systematic solution to support these steps has been lacking in the current HPC software ecosystem. To that end, this paper develops SIRIUS, a progressive JPEG-like data management scheme for storing and analyzing big scientific data. It co-designs data decimation, compression, and data storage, taking the hardware characteristics of each storage tier into considerations. With reasonably low overhead, our approach refactors simulation data, using either topological or uniform decimation, into a much smaller, reduced-accuracy base dataset, and a series of deltas that is used to augment the accuracy if needed. The base dataset and deltas are compressed and written to multiple storage tiers. Data saved on different tiers can then be selectively retrieved to restore the level of accuracy that satisfies data analytics. Thus, SIRIUS provides a paradigm shift towards elastic data analytics and enables end users to make trade-offs between analysis speed and accuracy on-the-fly. This paper further develops algorithms to preserve statistics for data decimation, a common requirement for reducing data. Here, we assess the impact of SIRIUS on unstructured triangular meshes, a pervasive data model used in scientific simulations. In particular, we evaluate two realistic use cases: the blob detection in fusion and high-pressure area extraction in computational fluid dynamics.},
doi = {10.1109/TMSCS.2018.2886851},
journal = {IEEE Transactions on Multi-Scale Computing Systems},
number = 4,
volume = 4,
place = {United States},
year = {Fri Dec 14 00:00:00 EST 2018},
month = {Fri Dec 14 00:00:00 EST 2018}
}