skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Damsel: A Data Model Storage Library for Exascale Science

Abstract

Computational science applications have been described as having one of seven motifs (the “seven dwarfs”), each having a particular pattern of computation and communication. From a storage and I/O perspective, these applications can also be grouped into a number of data model motifs describing the way data is organized and accessed during simulation, analysis, and visualization. Major storage data models developed in the 1990s, such as Network Common Data Format (netCDF) and Hierarchical Data Format (HDF) projects, created support for more complex data models. Development of both netCDF and HDF5 was influenced by multi-dimensional dataset storage requirements, but their access models and formats were designed with sequential storage in mind (e.g., a POSIX I/O model). Although these and other high-level I/O libraries have had a beneficial impact on large parallel applications, they do not always attain a high percentage of peak I/O performance due to fundamental design limitations, and they do not address the full range of current and future computational science data models. The goal of this project is to enable exascale computational science applications to interact conveniently and efficiently with storage through abstractions that match their data models. The project consists of three major activities: (1) identifying majormore » data model motifs in computational science applications and developing representative benchmarks; (2) developing a data model storage library, called Damsel, that supports these motifs, provides efficient storage data layouts, incorporates optimizations to enable exascale operation, and is tolerant to failures; and (3) productizing Damsel and working with computational scientists to encourage adoption of this library by the scientific community. The product of this project, Damsel library, is openly available for download from http://cucis.ece.northwestern.edu/projects/DAMSEL. Several case studies and application programming interface reference are also available to assist new users to learn to use the library.« less

Authors:
 [1];  [1]
  1. Northwestern Univ., Evanston, IL (United States)
Publication Date:
Research Org.:
Northwestern Univ., Evanston, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1136619
Report Number(s):
DOE-NORTHWESTERN-SC-0005309
DOE Contract Number:
SC0005309
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; parallel I/O; data model; I/O library, data storage

Citation Formats

Choudhary, Alok, and Liao, Wei-keng. Damsel: A Data Model Storage Library for Exascale Science. United States: N. p., 2014. Web. doi:10.2172/1136619.
Choudhary, Alok, & Liao, Wei-keng. Damsel: A Data Model Storage Library for Exascale Science. United States. doi:10.2172/1136619.
Choudhary, Alok, and Liao, Wei-keng. Fri . "Damsel: A Data Model Storage Library for Exascale Science". United States. doi:10.2172/1136619. https://www.osti.gov/servlets/purl/1136619.
@article{osti_1136619,
title = {Damsel: A Data Model Storage Library for Exascale Science},
author = {Choudhary, Alok and Liao, Wei-keng},
abstractNote = {Computational science applications have been described as having one of seven motifs (the “seven dwarfs”), each having a particular pattern of computation and communication. From a storage and I/O perspective, these applications can also be grouped into a number of data model motifs describing the way data is organized and accessed during simulation, analysis, and visualization. Major storage data models developed in the 1990s, such as Network Common Data Format (netCDF) and Hierarchical Data Format (HDF) projects, created support for more complex data models. Development of both netCDF and HDF5 was influenced by multi-dimensional dataset storage requirements, but their access models and formats were designed with sequential storage in mind (e.g., a POSIX I/O model). Although these and other high-level I/O libraries have had a beneficial impact on large parallel applications, they do not always attain a high percentage of peak I/O performance due to fundamental design limitations, and they do not address the full range of current and future computational science data models. The goal of this project is to enable exascale computational science applications to interact conveniently and efficiently with storage through abstractions that match their data models. The project consists of three major activities: (1) identifying major data model motifs in computational science applications and developing representative benchmarks; (2) developing a data model storage library, called Damsel, that supports these motifs, provides efficient storage data layouts, incorporates optimizations to enable exascale operation, and is tolerant to failures; and (3) productizing Damsel and working with computational scientists to encourage adoption of this library by the scientific community. The product of this project, Damsel library, is openly available for download from http://cucis.ece.northwestern.edu/projects/DAMSEL. Several case studies and application programming interface reference are also available to assist new users to learn to use the library.},
doi = {10.2172/1136619},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Fri Jul 11 00:00:00 EDT 2014},
month = {Fri Jul 11 00:00:00 EDT 2014}
}

Technical Report:

Save / Share:
  • The goal of this project is to enable exascale computational science applications to interact conveniently and efficiently with storage through abstractions that match their data models. We will accomplish this through three major activities: (1) identifying major data model motifs in computational science applications and developing representative benchmarks; (2) developing a data model storage library, called Damsel, that supports these motifs, provides efficient storage data layouts, incorporates optimizations to enable exascale operation, and is tolerant to failures; and (3) productizing Damsel and working with computational scientists to encourage adoption of this library by the scientific community.
  • The goal of this project is to enable exascale computational science applications to interact conveniently and efficiently with storage through abstractions that match their data models. We will accomplish this through three major activities: (1) identifying major data model motifs in computational science applications and developing representative benchmarks; (2) developing a data model storage library, called Damsel, that supports these motifs, provides efficient storage data layouts, incorporates optimizations to enable exascale operation, and is tolerant to failures; and (3) productizing Damsel and working with computational scientists to encourage adoption of this library by the scientific community.
  • Over the next 10 years, the Department of Energy will be transitioning from Petascale to Exascale Computing resulting in data storage, networking, and infrastructure requirements to increase by three orders of magnitude. The technologies and best practices used today are the result of a relatively slow evolution of ancestral technologies developed in the 1950s and 1960s. These include magnetic tape, magnetic disk, networking, databases, file systems, and operating systems. These technologies will continue to evolve over the next 10 to 15 years on a reasonably predictable path. Experience with the challenges involved in transitioning these fundamental technologies from Terascale tomore » Petascale computing systems has raised questions about how these will scale another 3 or 4 orders of magnitude to meet the requirements imposed by Exascale computing systems. This report is focused on the most concerning scaling issues with data storage systems as they relate to High Performance Computing- and presents options for a path forward. Given the ability to store exponentially increasing amounts of data, far more advanced concepts and use of metadata will be critical to managing data in Exascale computing systems.« less
  • Over the past five years, Lawrence Livermore National Laboratory has published extensive compilations derived from three of its main atomic data libraries. These are specifically the evaluated atomic relaxation data library, EADL, the evaluated electron interaction data library, EEDL, and the evaluated photon interaction data library, EPDL. All of these libraries span atomic numbers, Z, from 1 to 100. Additionally the particle interaction libraries cover the incident particle energy range from 10 eV to 100 GeV. The purpose of these libraries is to furnish data for particle transport calculations. Thus the files have been released for external distribution in amore » machine independent characterize format. In a complete coupled electron-photon transport analysis, results from all three of the data files are required. Therefore it is reasonable to discuss the format for all three libraries in the same work; that is the approach taken here. This report is composed of three sections, each section describing one of the libraries. For ease of reading, each section is separate and unique unto itself, including its own table numbers and references. This report will accompany any request for copies of these evaluated data libraries. This report and these three data libraries are available from the data centers at Brookhaven National Laboratory, RSIC (Oak Ridge National Laboratory), OECD/NEA Data Bank (France), and IAEA (Vienna).« less
  • The character file formats for the Lawrence Livermore National Laboratory evaluated atomic relaxation library (EADL), the electron library (EEDL), and the photon library (EPDL) are given in this report.