skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Parallel membership queries on very large scientific data sets using bitmap indexes

Abstract

© 2019 John Wiley & Sons, Ltd. Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating-point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word-Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from amore » particle accelerator model and a synthetic data set.« less

Authors:
ORCiD logo [1];  [1];  [1];  [1]
  1. Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley California
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1503658
DOE Contract Number:  
AC02-05CH11231
Resource Type:
Journal Article
Journal Name:
Concurrency and Computation. Practice and Experience
Additional Journal Information:
Journal Volume: 31; Journal Issue: 15; Journal ID: ISSN 1532-0626
Country of Publication:
United States
Language:
English

Citation Formats

Yildiz, Beytullah, Wu, Kesheng, Byna, Suren, and Shoshani, Arie. Parallel membership queries on very large scientific data sets using bitmap indexes. United States: N. p., 2019. Web. doi:10.1002/cpe.5157.
Yildiz, Beytullah, Wu, Kesheng, Byna, Suren, & Shoshani, Arie. Parallel membership queries on very large scientific data sets using bitmap indexes. United States. doi:10.1002/cpe.5157.
Yildiz, Beytullah, Wu, Kesheng, Byna, Suren, and Shoshani, Arie. Mon . "Parallel membership queries on very large scientific data sets using bitmap indexes". United States. doi:10.1002/cpe.5157.
@article{osti_1503658,
title = {Parallel membership queries on very large scientific data sets using bitmap indexes},
author = {Yildiz, Beytullah and Wu, Kesheng and Byna, Suren and Shoshani, Arie},
abstractNote = {© 2019 John Wiley & Sons, Ltd. Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating-point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word-Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from a particle accelerator model and a synthetic data set.},
doi = {10.1002/cpe.5157},
journal = {Concurrency and Computation. Practice and Experience},
issn = {1532-0626},
number = 15,
volume = 31,
place = {United States},
year = {2019},
month = {1}
}

Works referenced in this record:

The Square Kilometre Array
journal, August 2009


Queries and concept learning
journal, April 1988


High resolution simulation of beam dynamics in electron linacs for x-ray free electron lasers
journal, October 2009

  • Qiang, J.; Ryne, R. D.; Venturini, M.
  • Physical Review Special Topics - Accelerators and Beams, Vol. 12, Issue 10
  • DOI: 10.1103/PhysRevSTAB.12.100702

Database technology for decision support systems
journal, January 2001

  • Chaudhuri, S.; Dayal, U.; Ganti, V.
  • Computer, Vol. 34, Issue 12
  • DOI: 10.1109/2.970575

An overview of data warehousing and OLAP technology
journal, March 1997


Analyses of multi-level and multi-component compressed bitmap indexes
journal, February 2010

  • Wu, Kesheng; Shoshani, Arie; Stockinger, Kurt
  • ACM Transactions on Database Systems, Vol. 35, Issue 1
  • DOI: 10.1145/1670243.1670245

Multi-resolution bitmap indexes for scientific data
journal, August 2007

  • Sinha, Rishi Rakesh; Winslett, Marianne
  • ACM Transactions on Database Systems, Vol. 32, Issue 3
  • DOI: 10.1145/1272743.1272746

Optimizing bitmap indices with efficient compression
journal, March 2006

  • Wu, Kesheng; Otoo, Ekow J.; Shoshani, Arie
  • ACM Transactions on Database Systems, Vol. 31, Issue 1
  • DOI: 10.1145/1132863.1132864

Sorting improves word-aligned bitmap indexes
journal, January 2010


A survey of bitmap index compression algorithms for Big Data
journal, February 2015


Toward a modular and efficient distribution for Web service handlers: TOWARD A MODULAR AND EFFICIENT DISTRIBUTION FOR WEB SERVICE HANDLERS
journal, June 2012

  • Yildiz, Beytullah; Fox, Geoffrey C.
  • Concurrency and Computation: Practice and Experience, Vol. 25, Issue 3
  • DOI: 10.1002/cpe.2854

An Efficient Multi-Component Indexing Embedded Bitmap Compression for Data Reorganization
journal, January 2008


Bloofi: Multidimensional Bloom filters
journal, December 2015


Locality-Sensitive Bloom Filter for Approximate Membership Query
journal, June 2012

  • Hua, Yu; Xiao, Bin; Veeravalli, Bharadwaj
  • IEEE Transactions on Computers, Vol. 61, Issue 6
  • DOI: 10.1109/TC.2011.108

FastBit: interactively searching massive data
journal, July 2009