Expediting Scientific Data Analysis with Reorganization of Data
Conference
·
OSTI ID:1165204
Data producers typically optimize the layout of data files to minimize the write time. In most cases, data analysis tasks read these files in access patterns different from the write patterns causing poor read performance. In this paper, we introduce Scientific Data Services (SDS), a framework for bridging the performance gap between writing and reading scientific data. SDS reorganizes data to match the read patterns of analysis tasks and enables transparent data reads from the reorganized data. We implemented a HDF5 Virtual Object Layer (VOL) plugin to redirect the HDF5 dataset read calls to the reorganized data. To demonstrate the effectiveness of SDS, we applied two parallel data organization techniques: a sort-based organization on a plasma physics data and a transpose-based organization on mass spectrometry imaging data. We also extended the HDF5 data access API to allow selection of data based on their values through a query interface, called SDS Query. We evaluated the execution time in accessing various subsets of data through existing HDF5 Read API and SDS Query. We showed that reading the reorganized data using SDS is up to 55X faster than reading the original data.
- Research Organization:
- Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- DOE Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1165204
- Report Number(s):
- LBNL-6387E
- Country of Publication:
- United States
- Language:
- English
Similar Records
SDS: A Framework for Scientific Data Services
Usage Pattern-Driven Dynamic Data Layout Reorganization
HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets UsingFast Bitmap Indices
Conference
·
Thu Oct 31 00:00:00 EDT 2013
·
OSTI ID:1164907
Usage Pattern-Driven Dynamic Data Layout Reorganization
Conference
·
Sun May 01 00:00:00 EDT 2016
·
OSTI ID:1567419
HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets UsingFast Bitmap Indices
Conference
·
Tue Dec 06 23:00:00 EST 2005
·
OSTI ID:881619