Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Parallel Index and Query for Large Scale Data Analysis

Conference ·
OSTI ID:1056552

Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing of a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.

Research Organization:
Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US)
Sponsoring Organization:
Computational Research Division
DOE Contract Number:
AC02-05CH11231
OSTI ID:
1056552
Report Number(s):
LBNL-5317E
Country of Publication:
United States
Language:
English

Similar Records

FastQuery: A Parallel Indexing System for Scientific Data
Conference · Fri Jul 29 00:00:00 EDT 2011 · OSTI ID:1056551

Design of FastQuery: How to Generalize Indexing and Querying System for Scientific Data
Technical Report · Mon Apr 18 00:00:00 EDT 2011 · OSTI ID:1051264

MOSIQS: Persistent Memory Object Storage With Metadata Indexing and Querying for Scientific Computing
Journal Article · Tue Jun 08 00:00:00 EDT 2021 · IEEE Access · OSTI ID:1820827

Related Subjects