skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Damsel: A Data Model Storage Library for Exascale Science

Abstract

Computational science applications have been described as having one of seven motifs (the “seven dwarfs”), each having a particular pattern of computation and communication. From a storage and I/O perspective, these applications can also be grouped into a number of data model motifs describing the way data is organized and accessed during simulation, analysis, and visualization. Major storage data models developed in the 1990s, such as Network Common Data Format (netCDF) and Hierarchical Data Format (HDF) projects, created support for more complex data models. Development of both netCDF and HDF5 was influenced by multi-dimensional dataset storage requirements, but their access models and formats were designed with sequential storage in mind (e.g., a POSIX I/O model). Although these and other high-level I/O libraries have had a beneficial impact on large parallel applications, they do not always attain a high percentage of peak I/O performance due to fundamental design limitations, and they do not address the full range of current and future computational science data models. The goal of this project is to enable exascale computational science applications to interact conveniently and efficiently with storage through abstractions that match their data models. The project consists of three major activities: (1) identifying majormore » data model motifs in computational science applications and developing representative benchmarks; (2) developing a data model storage library, called Damsel, that supports these motifs, provides efficient storage data layouts, incorporates optimizations to enable exascale operation, and is tolerant to failures; and (3) productizing Damsel and working with computational scientists to encourage adoption of this library by the scientific community. The product of this project, Damsel library, is openly available for download from http://cucis.ece.northwestern.edu/projects/DAMSEL. Several case studies and application programming interface reference are also available to assist new users to learn to use the library.« less

Authors:
 [1];  [1]
  1. Northwestern Univ., Evanston, IL (United States)
Publication Date:
Research Org.:
Northwestern Univ., Evanston, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1136619
Report Number(s):
DOE-NORTHWESTERN-SC-0005309
DOE Contract Number:
SC0005309
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; parallel I/O; data model; I/O library, data storage

Citation Formats

Choudhary, Alok, and Liao, Wei-keng. Damsel: A Data Model Storage Library for Exascale Science. United States: N. p., 2014. Web. doi:10.2172/1136619.
Choudhary, Alok, & Liao, Wei-keng. Damsel: A Data Model Storage Library for Exascale Science. United States. doi:10.2172/1136619.
Choudhary, Alok, and Liao, Wei-keng. 2014. "Damsel: A Data Model Storage Library for Exascale Science". United States. doi:10.2172/1136619. https://www.osti.gov/servlets/purl/1136619.
@article{osti_1136619,
title = {Damsel: A Data Model Storage Library for Exascale Science},
author = {Choudhary, Alok and Liao, Wei-keng},
abstractNote = {Computational science applications have been described as having one of seven motifs (the “seven dwarfs”), each having a particular pattern of computation and communication. From a storage and I/O perspective, these applications can also be grouped into a number of data model motifs describing the way data is organized and accessed during simulation, analysis, and visualization. Major storage data models developed in the 1990s, such as Network Common Data Format (netCDF) and Hierarchical Data Format (HDF) projects, created support for more complex data models. Development of both netCDF and HDF5 was influenced by multi-dimensional dataset storage requirements, but their access models and formats were designed with sequential storage in mind (e.g., a POSIX I/O model). Although these and other high-level I/O libraries have had a beneficial impact on large parallel applications, they do not always attain a high percentage of peak I/O performance due to fundamental design limitations, and they do not address the full range of current and future computational science data models. The goal of this project is to enable exascale computational science applications to interact conveniently and efficiently with storage through abstractions that match their data models. The project consists of three major activities: (1) identifying major data model motifs in computational science applications and developing representative benchmarks; (2) developing a data model storage library, called Damsel, that supports these motifs, provides efficient storage data layouts, incorporates optimizations to enable exascale operation, and is tolerant to failures; and (3) productizing Damsel and working with computational scientists to encourage adoption of this library by the scientific community. The product of this project, Damsel library, is openly available for download from http://cucis.ece.northwestern.edu/projects/DAMSEL. Several case studies and application programming interface reference are also available to assist new users to learn to use the library.},
doi = {10.2172/1136619},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2014,
month = 7
}

Technical Report:

Save / Share:
  • The goal of this project is to enable exascale computational science applications to interact conveniently and efficiently with storage through abstractions that match their data models. We will accomplish this through three major activities: (1) identifying major data model motifs in computational science applications and developing representative benchmarks; (2) developing a data model storage library, called Damsel, that supports these motifs, provides efficient storage data layouts, incorporates optimizations to enable exascale operation, and is tolerant to failures; and (3) productizing Damsel and working with computational scientists to encourage adoption of this library by the scientific community.
  • The goal of this project is to enable exascale computational science applications to interact conveniently and efficiently with storage through abstractions that match their data models. We will accomplish this through three major activities: (1) identifying major data model motifs in computational science applications and developing representative benchmarks; (2) developing a data model storage library, called Damsel, that supports these motifs, provides efficient storage data layouts, incorporates optimizations to enable exascale operation, and is tolerant to failures; and (3) productizing Damsel and working with computational scientists to encourage adoption of this library by the scientific community.
  • Scientific computation has come into its own as a mature technology in all fields of science. Never before have we been able to accurately anticipate, analyze, and plan for complex events that have not yet occurred from the operation of a reactor running at 100 million degrees centigrade to the changing climate a century down the road. Combined with the more traditional approaches of theory and experiment, scientific computation provides a profound tool for insight and solution as we look at complex systems containing billions of components. Nevertheless, it cannot yet do all we would like. Much of scientific computationmore » s potential remains untapped in areas such as materials science, Earth science, energy assurance, fundamental science, biology and medicine, engineering design, and national security because the scientific challenges are far too enormous and complex for the computational resources at hand. Many of these challenges are of immediate global importance. These challenges can be overcome by a revolution in computing that promises real advancement at a greatly accelerated pace. Planned petascale systems (capable of a petaflop, or 1015 floating point operations per second) in the next 3 years and exascale systems (capable of an exaflop, or 1018 floating point operations per second) in the next decade will provide an unprecedented opportunity to attack these global challenges through modeling and simulation. Exascale computers, with a processing capability similar to that of the human brain, will enable the unraveling of longstanding scientific mysteries and present new opportunities. Table ES.1 summarizes these scientific opportunities, their key application areas, and the goals and associated benefits that would result from solutions afforded by exascale computing.« less
  • Computers have revolutionized every aspect of our lives. Yet in science, the most tantalizing applications of computing lie just beyond our reach. The current quest to build an exascale computer with one thousand times the capability of today’s fastest machines (and more than a million times that of a laptop) will take researchers over the next horizon. The field of materials, chemical reactions, and compounds is inherently complex. Imagine millions of new materials with new functionalities waiting to be discovered — while researchers also seek to extend those materials that are known to a dizzying number of new forms. Wemore » could translate massive amounts of data from high precision experiments into new understanding through data mining and analysis. We could have at our disposal the ability to predict the properties of these materials, to follow their transformations during reactions on an atom-by-atom basis, and to discover completely new chemical pathways or physical states of matter. Extending these predictions from the nanoscale to the mesoscale, from the ultrafast world of reactions to long-time simulations to predict the lifetime performance of materials, and to the discovery of new materials and processes will have a profound impact on energy technology. In addition, discovery of new materials is vital to move computing beyond Moore’s law. To realize this vision, more than hardware is needed. New algorithms to take advantage of the increase in computing power, new programming paradigms, and new ways of mining massive data sets are needed as well. This report summarizes the opportunities and the requisite computing ecosystem needed to realize the potential before us. In addition to pursuing new and more complete physical models and theoretical frameworks, this review found that the following broadly grouped areas relevant to the U.S. Department of Energy (DOE) Office of Advanced Scientific Computing Research (ASCR) would directly affect the Basic Energy Sciences (BES) mission need. Simulation, visualization, and data analysis are crucial for advances in energy science and technology. Revolutionary mathematical, software, and algorithm developments are required in all areas of BES science to take advantage of exascale computing architectures and to meet data analysis, management, and workflow needs. In partnership with ASCR, BES has an emerging and pressing need to develop new and disruptive capabilities in data science. More capable and larger high-performance computing (HPC) and data ecosystems are required to support priority research in BES. Continued success in BES research requires developing the next-generation workforce through education and training and by providing sustained career opportunities.« less
  • The U.S. Department of Energy (DOE) Office of Science (SC) Offices of High Energy Physics (HEP) and Advanced Scientific Computing Research (ASCR) convened a programmatic Exascale Requirements Review on June 10–12, 2015, in Bethesda, Maryland. This report summarizes the findings, results, and recommendations derived from that meeting. The high-level findings and observations are as follows. Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude — and in some cases greatermore » — than that available currently. The growth rate of data produced by simulations is overwhelming the current ability of both facilities and researchers to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. Data rates and volumes from experimental facilities are also straining the current HEP infrastructure in its ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. A close integration of high-performance computing (HPC) simulation and data analysis will greatly aid in interpreting the results of HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. Long-range planning between HEP and ASCR will be required to meet HEP’s research needs. To best use ASCR HPC resources, the experimental HEP program needs (1) an established, long-term plan for access to ASCR computational and data resources, (2) the ability to map workflows to HPC resources, (3) the ability for ASCR facilities to accommodate workflows run by collaborations potentially comprising thousands of individual members, (4) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, (5) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.« less