skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: High-performance scientific data management system.

Abstract

Many scientific applications have large I/O requirements, in terms of both the size of data and the number of files or data sets. Management, storage, efficient access, and analysis of this data present an extremely challenging task. Traditionally, two different solutions have been used for this task: file I/O or databases. File I/O can provide high performance but is tedious to use with large numbers of files and large and complex data sets. Databases can be convenient, flexible, and powerful but do not perform and scale well for parallel supercomputing applications. We have developed a software system, called Scientific Data Manager (SDM), that combines the good features of both file I/O and databases. SDM provides a high-level application programming interface to the user and, internally, uses a parallel file system to store real data (using various I/O optimizations available in MPI-IO) and a database to store application-related metadata. In order to support I/O in irregular applications, SDM makes extensive use of MPI-IO's noncontiguous collective I/O functions. Moreover, SDM uses the concept of a history file to optimize the cost of the index distribution using the metadata stored in database. We describe the design and implementation of SDM and present performancemore » results with two regular applications, ASTRO3D and an Euler solver, and with two irregular applications, a CFD code called FUN3D and a Rayleigh-Taylor instability code.« less

Authors:
; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC); National Science Foundation (NSF)
OSTI Identifier:
961305
Report Number(s):
ANL/MCS/JA-46414
TRN: US201011%%578
DOE Contract Number:  
DE-AC02-06CH11357
Resource Type:
Journal Article
Journal Name:
J. Parallel Distrib. Comput.
Additional Journal Information:
Journal Volume: 63; Journal Issue: 4 ; Apr. 2003
Country of Publication:
United States
Language:
ENGLISH
Subject:
32 ENERGY CONSERVATION, CONSUMPTION, AND UTILIZATION; 97 MATHEMATICAL METHODS AND COMPUTING; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; COMPUTER CODES; COST; DATA; DATA PROCESSING; DATA BASE MANAGEMENT; DESIGN; DISTRIBUTION; FUNCTIONS; IMPLEMENTATION; INTERFACES; PARALLEL PROCESSING; PERFORMANCE; PROGRAMMING; RAYLEIGH-TAYLOR INSTABILITY; SIZE; STORAGE

Citation Formats

No, J, Thakur, R, Choudhary, A, Mathematics and Computer Science, Sejong Univ., and Northwestern Univ. High-performance scientific data management system.. United States: N. p., 2003. Web. doi:10.1016/S0743-7315(03)00036-4.
No, J, Thakur, R, Choudhary, A, Mathematics and Computer Science, Sejong Univ., & Northwestern Univ. High-performance scientific data management system.. United States. doi:10.1016/S0743-7315(03)00036-4.
No, J, Thakur, R, Choudhary, A, Mathematics and Computer Science, Sejong Univ., and Northwestern Univ. Tue . "High-performance scientific data management system.". United States. doi:10.1016/S0743-7315(03)00036-4.
@article{osti_961305,
title = {High-performance scientific data management system.},
author = {No, J and Thakur, R and Choudhary, A and Mathematics and Computer Science and Sejong Univ. and Northwestern Univ.},
abstractNote = {Many scientific applications have large I/O requirements, in terms of both the size of data and the number of files or data sets. Management, storage, efficient access, and analysis of this data present an extremely challenging task. Traditionally, two different solutions have been used for this task: file I/O or databases. File I/O can provide high performance but is tedious to use with large numbers of files and large and complex data sets. Databases can be convenient, flexible, and powerful but do not perform and scale well for parallel supercomputing applications. We have developed a software system, called Scientific Data Manager (SDM), that combines the good features of both file I/O and databases. SDM provides a high-level application programming interface to the user and, internally, uses a parallel file system to store real data (using various I/O optimizations available in MPI-IO) and a database to store application-related metadata. In order to support I/O in irregular applications, SDM makes extensive use of MPI-IO's noncontiguous collective I/O functions. Moreover, SDM uses the concept of a history file to optimize the cost of the index distribution using the metadata stored in database. We describe the design and implementation of SDM and present performance results with two regular applications, ASTRO3D and an Euler solver, and with two irregular applications, a CFD code called FUN3D and a Rayleigh-Taylor instability code.},
doi = {10.1016/S0743-7315(03)00036-4},
journal = {J. Parallel Distrib. Comput.},
number = 4 ; Apr. 2003,
volume = 63,
place = {United States},
year = {2003},
month = {4}
}