Parallel Index and Query for Large Scale Data Analysis
Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing of a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.
- Research Organization:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- Computational Research Division
- DOE Contract Number:
- DE-AC02-05CH11231
- OSTI ID:
- 1056552
- Report Number(s):
- LBNL-5317E
- Resource Relation:
- Conference: SC11, Seattle, WA, USA, November 12 - 18, 2011
- Country of Publication:
- United States
- Language:
- English
Similar Records
Design of FastQuery: How to Generalize Indexing and Querying System for Scientific Data
MOSIQS: Persistent Memory Object Storage With Metadata Indexing and Querying for Scientific Computing