skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: FastQuery: A Parallel Indexing System for Scientific Data

Abstract

Modern scientific datasets present numerous data management and analysis challenges. State-of-the- art index and query technologies such as FastBit can significantly improve accesses to these datasets by augmenting the user data with indexes and other secondary information. However, a challenge is that the indexes assume the relational data model but the scientific data generally follows the array data model. To match the two data models, we design a generic mapping mechanism and implement an efficient input and output interface for reading and writing the data and their corresponding indexes. To take advantage of the emerging many-core architectures, we also develop a parallel strategy for indexing using threading technology. This approach complements our on-going MPI-based parallelization efforts. We demonstrate the flexibility of our software by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using data from a particle accelerator model and a global climate model. We also conducted a detailed performance study using these scientific datasets. The results show that FastQuery speeds up the query time by a factor of 2.5x to 50x, and it reduces the indexing time by a factor of 16 on 24 cores.

Authors:
; ;
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
Computational Research Division
OSTI Identifier:
1056551
Report Number(s):
LBNL-5315E
DOE Contract Number:  
DE-AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: IASDS 2011: Workshop on Interfaces and Abstractions for Scientific Data Storage, Austin, TX, USA, 09/30/2011
Country of Publication:
United States
Language:
English
Subject:
96 KNOWLEDGE MANAGEMENT AND PRESERVATION; 97 MATHEMATICS AND COMPUTING

Citation Formats

Chou, Jerry, Wu, Kesheng, and Prabhat,. FastQuery: A Parallel Indexing System for Scientific Data. United States: N. p., 2011. Web.
Chou, Jerry, Wu, Kesheng, & Prabhat,. FastQuery: A Parallel Indexing System for Scientific Data. United States.
Chou, Jerry, Wu, Kesheng, and Prabhat,. 2011. "FastQuery: A Parallel Indexing System for Scientific Data". United States. https://www.osti.gov/servlets/purl/1056551.
@article{osti_1056551,
title = {FastQuery: A Parallel Indexing System for Scientific Data},
author = {Chou, Jerry and Wu, Kesheng and Prabhat,},
abstractNote = {Modern scientific datasets present numerous data management and analysis challenges. State-of-the- art index and query technologies such as FastBit can significantly improve accesses to these datasets by augmenting the user data with indexes and other secondary information. However, a challenge is that the indexes assume the relational data model but the scientific data generally follows the array data model. To match the two data models, we design a generic mapping mechanism and implement an efficient input and output interface for reading and writing the data and their corresponding indexes. To take advantage of the emerging many-core architectures, we also develop a parallel strategy for indexing using threading technology. This approach complements our on-going MPI-based parallelization efforts. We demonstrate the flexibility of our software by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using data from a particle accelerator model and a global climate model. We also conducted a detailed performance study using these scientific datasets. The results show that FastQuery speeds up the query time by a factor of 2.5x to 50x, and it reduces the indexing time by a factor of 16 on 24 cores.},
doi = {},
url = {https://www.osti.gov/biblio/1056551}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Fri Jul 29 00:00:00 EDT 2011},
month = {Fri Jul 29 00:00:00 EDT 2011}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: