skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Final Technical Report - Proactive Data Containers for Scientific Storage

Abstract

Emerging HPC systems are expected to be deployed with an unprecedented level of complexity, due to a deep system memory/storage hierarchy and heterogeneity of the storage hardware. This hierarchy is expected to range from CPU cache through several levels of volatile memory to nonvolatile memory, traditional hard disks, and tape. Simple and efficient methods of data management and movement through this hierarchy is critical for scientific applications using exascale systems. Existing storage system and I/O (SSIO) technologies face severe challenges in dealing with these requirements. POSIX and MPI I/O standards that are the basis for existing I/O libraries and parallel file systems present fundamental challenges in the areas of scalable metadata operations, semantics-based data movement performance tuning, asynchronous operation, and support for scalable consistency of distributed operations. Moving toward new paradigms for SSIO in the extreme-scale era, we have proposed to investigate novel object-based data abstractions and storage mechanisms that take advantage of the deep storage hierarchy and enable proactive automated performance tuning. In order to achieve these overarching goals, we initiated an effort to develop a fundamental new data abstraction, called Proactive Data Containers (PDC). A PDC is a container within a locus of storage (memory, NVRAM, disk, etc.)more » that stores science data in an object-centric manner. Managing data as objects enables powerful optimization opportunities for data movement and transformations. The R&D focus of this project are: 1) formulation of object-oriented PDCs and their mapping in different levels of the exascale storage hierarchy; 2) efficient strategies for moving data in deep storage hierarchies using PDCs; 3) techniques for transforming and reorganizing data based on application requirements; and 4) novel analysis paradigms for enabling data transformations and user-defined analysis on data in PDCs. Toward achieving these overarching goals, we designed an object-centric application programing interface (API) for HPC, scalable metadata management for object-centric storage systems, and data movement optimizations such as Data Elevator for moving data between two levels of storage devices and TAPIOCA for efficient aggregation of data on compute nodes. We then implemented several components of the PDC system. They include metadata management, data placement services, remote procedure calls, data aggregation, etc. We have put them together into the overall PDC framework.« less

Authors:
; ; ; ; ; ; ; ; ; ;
Publication Date:
Research Org.:
The HDF Group
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1577855
Report Number(s):
DOE-HDF-16454
DOE Contract Number:  
SC0016454
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English

Citation Formats

Soumagne, Jerome, Warren, Richard, Mu, Jingqing, Vishwanath, Venkat, Tessier, Francois, Byna, Suren, Koziol, Quincey, Tang, Houjun, Wang, Teng, Dong, Bin, and Liu, Jialin. Final Technical Report - Proactive Data Containers for Scientific Storage. United States: N. p., 2019. Web. doi:10.2172/1577855.
Soumagne, Jerome, Warren, Richard, Mu, Jingqing, Vishwanath, Venkat, Tessier, Francois, Byna, Suren, Koziol, Quincey, Tang, Houjun, Wang, Teng, Dong, Bin, & Liu, Jialin. Final Technical Report - Proactive Data Containers for Scientific Storage. United States. https://doi.org/10.2172/1577855
Soumagne, Jerome, Warren, Richard, Mu, Jingqing, Vishwanath, Venkat, Tessier, Francois, Byna, Suren, Koziol, Quincey, Tang, Houjun, Wang, Teng, Dong, Bin, and Liu, Jialin. Tue . "Final Technical Report - Proactive Data Containers for Scientific Storage". United States. https://doi.org/10.2172/1577855. https://www.osti.gov/servlets/purl/1577855.
@article{osti_1577855,
title = {Final Technical Report - Proactive Data Containers for Scientific Storage},
author = {Soumagne, Jerome and Warren, Richard and Mu, Jingqing and Vishwanath, Venkat and Tessier, Francois and Byna, Suren and Koziol, Quincey and Tang, Houjun and Wang, Teng and Dong, Bin and Liu, Jialin},
abstractNote = {Emerging HPC systems are expected to be deployed with an unprecedented level of complexity, due to a deep system memory/storage hierarchy and heterogeneity of the storage hardware. This hierarchy is expected to range from CPU cache through several levels of volatile memory to nonvolatile memory, traditional hard disks, and tape. Simple and efficient methods of data management and movement through this hierarchy is critical for scientific applications using exascale systems. Existing storage system and I/O (SSIO) technologies face severe challenges in dealing with these requirements. POSIX and MPI I/O standards that are the basis for existing I/O libraries and parallel file systems present fundamental challenges in the areas of scalable metadata operations, semantics-based data movement performance tuning, asynchronous operation, and support for scalable consistency of distributed operations. Moving toward new paradigms for SSIO in the extreme-scale era, we have proposed to investigate novel object-based data abstractions and storage mechanisms that take advantage of the deep storage hierarchy and enable proactive automated performance tuning. In order to achieve these overarching goals, we initiated an effort to develop a fundamental new data abstraction, called Proactive Data Containers (PDC). A PDC is a container within a locus of storage (memory, NVRAM, disk, etc.) that stores science data in an object-centric manner. Managing data as objects enables powerful optimization opportunities for data movement and transformations. The R&D focus of this project are: 1) formulation of object-oriented PDCs and their mapping in different levels of the exascale storage hierarchy; 2) efficient strategies for moving data in deep storage hierarchies using PDCs; 3) techniques for transforming and reorganizing data based on application requirements; and 4) novel analysis paradigms for enabling data transformations and user-defined analysis on data in PDCs. Toward achieving these overarching goals, we designed an object-centric application programing interface (API) for HPC, scalable metadata management for object-centric storage systems, and data movement optimizations such as Data Elevator for moving data between two levels of storage devices and TAPIOCA for efficient aggregation of data on compute nodes. We then implemented several components of the PDC system. They include metadata management, data placement services, remote procedure calls, data aggregation, etc. We have put them together into the overall PDC framework.},
doi = {10.2172/1577855},
url = {https://www.osti.gov/biblio/1577855}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {12}
}