skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Damsel: A Data Model Storage Library for Exascale Science

Abstract

Computational science applications have been described as having one of seven motifs (the “seven dwarfs”), each having a particular pattern of computation and communication. From a storage and I/O perspective, these applications can also be grouped into a number of data model motifs describing the way data is organized and accessed during simulation, analysis, and visualization. Major storage data models developed in the 1990s, such as Network Common Data Format (netCDF) and Hierarchical Data Format (HDF) projects, created support for more complex data models. Development of both netCDF and HDF5 was influenced by multi-dimensional dataset storage requirements, but their access models and formats were designed with sequential storage in mind (e.g., a POSIX I/O model). Although these and other high-level I/O libraries have had a beneficial impact on large parallel applications, they do not always attain a high percentage of peak I/O performance due to fundamental design limitations, and they do not address the full range of current and future computational science data models. The goal of this project is to enable exascale computational science applications to interact conveniently and efficiently with storage through abstractions that match their data models. The project consists of three major activities: (1) identifying majormore » data model motifs in computational science applications and developing representative benchmarks; (2) developing a data model storage library, called Damsel, that supports these motifs, provides efficient storage data layouts, incorporates optimizations to enable exascale operation, and is tolerant to failures; and (3) productizing Damsel and working with computational scientists to encourage adoption of this library by the scientific community. The product of this project, Damsel library, is openly available for download from http://cucis.ece.northwestern.edu/projects/DAMSEL. Several case studies and application programming interface reference are also available to assist new users to learn to use the library.« less

Authors:
 [1];  [1]
  1. Northwestern Univ., Evanston, IL (United States)
Publication Date:
Research Org.:
Northwestern Univ., Evanston, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1136619
Report Number(s):
DOE-NORTHWESTERN-SC-0005309
DOE Contract Number:  
SC0005309
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; parallel I/O; data model; I/O library, data storage

Citation Formats

Choudhary, Alok, and Liao, Wei-keng. Damsel: A Data Model Storage Library for Exascale Science. United States: N. p., 2014. Web. doi:10.2172/1136619.
Choudhary, Alok, & Liao, Wei-keng. Damsel: A Data Model Storage Library for Exascale Science. United States. https://doi.org/10.2172/1136619
Choudhary, Alok, and Liao, Wei-keng. 2014. "Damsel: A Data Model Storage Library for Exascale Science". United States. https://doi.org/10.2172/1136619. https://www.osti.gov/servlets/purl/1136619.
@article{osti_1136619,
title = {Damsel: A Data Model Storage Library for Exascale Science},
author = {Choudhary, Alok and Liao, Wei-keng},
abstractNote = {Computational science applications have been described as having one of seven motifs (the “seven dwarfs”), each having a particular pattern of computation and communication. From a storage and I/O perspective, these applications can also be grouped into a number of data model motifs describing the way data is organized and accessed during simulation, analysis, and visualization. Major storage data models developed in the 1990s, such as Network Common Data Format (netCDF) and Hierarchical Data Format (HDF) projects, created support for more complex data models. Development of both netCDF and HDF5 was influenced by multi-dimensional dataset storage requirements, but their access models and formats were designed with sequential storage in mind (e.g., a POSIX I/O model). Although these and other high-level I/O libraries have had a beneficial impact on large parallel applications, they do not always attain a high percentage of peak I/O performance due to fundamental design limitations, and they do not address the full range of current and future computational science data models. The goal of this project is to enable exascale computational science applications to interact conveniently and efficiently with storage through abstractions that match their data models. The project consists of three major activities: (1) identifying major data model motifs in computational science applications and developing representative benchmarks; (2) developing a data model storage library, called Damsel, that supports these motifs, provides efficient storage data layouts, incorporates optimizations to enable exascale operation, and is tolerant to failures; and (3) productizing Damsel and working with computational scientists to encourage adoption of this library by the scientific community. The product of this project, Damsel library, is openly available for download from http://cucis.ece.northwestern.edu/projects/DAMSEL. Several case studies and application programming interface reference are also available to assist new users to learn to use the library.},
doi = {10.2172/1136619},
url = {https://www.osti.gov/biblio/1136619}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Fri Jul 11 00:00:00 EDT 2014},
month = {Fri Jul 11 00:00:00 EDT 2014}
}