skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Proactive Data Containers for Scientific Storage (Final Report)

Technical Report ·
DOI:https://doi.org/10.2172/1577855· OSTI ID:1577855
 [1];  [1];  [1];  [1];  [2];  [2];  [3];  [3];  [3];  [3];  [3];  [3]
  1. The HDF Group, Champaign, IL (United States)
  2. Argonne National Laboratory (ANL), Argonne, IL (United States)
  3. Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)

Emerging HPC systems are expected to be deployed with an unprecedented level of complexity, due to a deep system memory/storage hierarchy and heterogeneity of the storage hardware. This hierarchy is expected to range from CPU cache through several levels of volatile memory to nonvolatile memory, traditional hard disks, and tape. Simple and efficient methods of data management and movement through this hierarchy is critical for scientific applications using exascale systems. Existing storage system and I/O (SSIO) technologies face severe challenges in dealing with these requirements. POSIX and MPI I/O standards that are the basis for existing I/O libraries and parallel file systems present fundamental challenges in the areas of scalable metadata operations, semantics-based data movement performance tuning, asynchronous operation, and support for scalable consistency of distributed operations. Moving toward new paradigms for SSIO in the extreme-scale era, we have proposed to investigate novel object-based data abstractions and storage mechanisms that take advantage of the deep storage hierarchy and enable proactive automated performance tuning. In order to achieve these overarching goals, we initiated an effort to develop a fundamental new data abstraction, called Proactive Data Containers (PDC). A PDC is a container within a locus of storage (memory, NVRAM, disk, etc.) that stores science data in an object-centric manner. Managing data as objects enables powerful optimization opportunities for data movement and transformations. The R&D focus of this project are: 1) formulation of object-oriented PDCs and their mapping in different levels of the exascale storage hierarchy; 2) efficient strategies for moving data in deep storage hierarchies using PDCs; 3) techniques for transforming and reorganizing data based on application requirements; and 4) novel analysis paradigms for enabling data transformations and user-defined analysis on data in PDCs. Toward achieving these overarching goals, we designed an object-centric application programing interface (API) for HPC, scalable metadata management for object-centric storage systems, and data movement optimizations such as Data Elevator for moving data between two levels of storage devices and TAPIOCA for efficient aggregation of data on compute nodes. We then implemented several components of the PDC system. They include metadata management, data placement services, remote procedure calls, data aggregation, etc. We have put them together into the overall PDC framework.

Research Organization:
The HDF Group, Champaign, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
SC0016454
OSTI ID:
1577855
Report Number(s):
DOE-HDF-16454
Country of Publication:
United States
Language:
English

Similar Records

Interfacing HDF5 with a scalable object‐centric storage system on hierarchical storage
Journal Article · Mon Mar 09 00:00:00 EDT 2020 · Concurrency and Computation. Practice and Experience · OSTI ID:1577855

Toward Transparent Data Management in Multi-layer Storage Hierarchy for HPC Systems
Journal Article · Tue Apr 17 00:00:00 EDT 2018 · OSTI ID:1577855

ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems
Journal Article · Fri Jan 17 00:00:00 EST 2020 · Journal of Computer Science and Technology · OSTI ID:1577855

Related Subjects