skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Querying Large Scientific Data Sets with Adaptable IO System ADIOS

Abstract

When working with a large dataset, a relatively small fraction of data records are of interest in each analysis operation. For example, while examining a billion-particle dataset from an accelerator model, the scientists might focus on a few thousand fastest particles, or on the particle farthest from the beam center. In general, this type of selective data access is challenging because the selected data records could be anywhere in the dataset and require a significant amount of time to locate and retrieve. In this paper, we report our experience of addressing this data access challenge with the Adaptable IO System ADIOS. More specifically, we design a query interface for ADIOS to allow arbitrary combinations of range conditions on known variables, implement a number of different mechanisms for resolving these selection conditions, and devise strategies to reduce the time needed to retrieve the scattered data records. In many cases, the query mechanism can retrieve the selected data records orders of magnitude faster than the brute-force approach.Our work relies heavily on the in situ data processing feature of ADIOS to allow user functions to be executed in the data transport pipeline. This feature allows us to build indexes for efficient query processing,more » and to perform other intricate analyses while the data is in memory.« less

Authors:
 [1]; ORCiD logo [2]; ORCiD logo [2];  [1];  [1]
  1. Lawrence Berkeley National Laboratory (LBNL)
  2. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1560494
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Journal Volume: 10776; Conference: Asian Conference on Supercomputing Frontiers (SCFA 2018) - , , Singapore - 3/26/2018 8:00:00 AM-3/29/2018 4:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Gu, Junmin, Klasky, Scott A., Podhorszki, Norbert, Qiang, Ji, and Wu, Kesheng. Querying Large Scientific Data Sets with Adaptable IO System ADIOS. United States: N. p., 2018. Web. doi:10.1007/978-3-319-69953-0_4.
Gu, Junmin, Klasky, Scott A., Podhorszki, Norbert, Qiang, Ji, & Wu, Kesheng. Querying Large Scientific Data Sets with Adaptable IO System ADIOS. United States. doi:10.1007/978-3-319-69953-0_4.
Gu, Junmin, Klasky, Scott A., Podhorszki, Norbert, Qiang, Ji, and Wu, Kesheng. Thu . "Querying Large Scientific Data Sets with Adaptable IO System ADIOS". United States. doi:10.1007/978-3-319-69953-0_4. https://www.osti.gov/servlets/purl/1560494.
@article{osti_1560494,
title = {Querying Large Scientific Data Sets with Adaptable IO System ADIOS},
author = {Gu, Junmin and Klasky, Scott A. and Podhorszki, Norbert and Qiang, Ji and Wu, Kesheng},
abstractNote = {When working with a large dataset, a relatively small fraction of data records are of interest in each analysis operation. For example, while examining a billion-particle dataset from an accelerator model, the scientists might focus on a few thousand fastest particles, or on the particle farthest from the beam center. In general, this type of selective data access is challenging because the selected data records could be anywhere in the dataset and require a significant amount of time to locate and retrieve. In this paper, we report our experience of addressing this data access challenge with the Adaptable IO System ADIOS. More specifically, we design a query interface for ADIOS to allow arbitrary combinations of range conditions on known variables, implement a number of different mechanisms for resolving these selection conditions, and devise strategies to reduce the time needed to retrieve the scattered data records. In many cases, the query mechanism can retrieve the selected data records orders of magnitude faster than the brute-force approach.Our work relies heavily on the in situ data processing feature of ADIOS to allow user functions to be executed in the data transport pipeline. This feature allows us to build indexes for efficient query processing, and to perform other intricate analyses while the data is in memory.},
doi = {10.1007/978-3-319-69953-0_4},
journal = {},
issn = {0302--9743},
number = ,
volume = 10776,
place = {United States},
year = {2018},
month = {3}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: