DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Data Pallets: Containerizing Storage For Reproducibility and Traceability.

Journal Article · · Lecture Notes in Computer Science
 [1];  [2];  [1]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  2. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Georgia Inst. of Technology, Atlanta, GA (United States)

Trusting simulation output is crucial for Sandia’s mission objectives. Here, we rely on these simulations to perform our high-consequence mission tasks given national treaty obligations. Other science and modeling applications, while they may have high-consequence results, still require the strongest levels of trust to enable using the result as the foundation for both practical applications and future research. To this end, the computing community has developed workflow and provenance systems to aid in both automating simulation and modeling execution as well as determining exactly how was some output was created so that conclusions can be drawn from the data. Current approaches for workflows and provenance systems are all at the user level and have little to no system level support making them fragile, difficult to use, and incomplete solutions. The introduction of container technology is a first step towards encapsulating and tracking artifacts used in creating data and resulting insights, but their current implementation is focused solely on making it easy to deploy an application in an isolated “sandbox” and maintaining a strictly read-only mode to avoid any potential changes to the application. All storage activities are still using the system-level shared storage. This project explores extending the container concept to include storage as a new container type we call data pallets. Data Pallets are potentially writeable, auto generated by the system based on IO activities, and usable as a way to link the contained data back to the application and input deck used to create it.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA), Office of Defense Nuclear Security
Grant/Contract Number:
AC04-94AL85000; NA0003525
OSTI ID:
1595037
Report Number(s):
SAND-2018-12861J; 669886
Journal Information:
Lecture Notes in Computer Science, Vol. 11887; ISSN 0302-9743
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English

References (6)

NetCDF: an interface for scientific data access journal July 1990
Scientific workflow management and the Kepler system
  • Ludäscher, Bertram; Altintas, Ilkay; Berkley, Chad
  • Concurrency and Computation: Practice and Experience, Vol. 18, Issue 10 https://doi.org/10.1002/cpe.994
journal January 2006
Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)
  • Lofstead, Jay F.; Klasky, Scott; Schwan, Karsten
  • Proceedings of the 6th international workshop on Challenges of large applications in distributed environments - CLADE '08 https://doi.org/10.1145/1383529.1383533
conference January 2008
Singularity: Scientific containers for mobility of compute journal May 2017
Swift: A language for distributed parallel scripting journal September 2011
Pegasus, a workflow management system for science automation journal May 2015