Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Domain-Specific Type-Safe APIs for Hierarchical Scientific Data with Modern C++

Conference ·

General-purpose library application programming interfaces (APIs) for self-describing hierarchical scientific data storage, such as the HDF5 and NetCDF libraries, are traditionally of runtime nature. Runtime errors for entry existence and data types are typically caught later in the development process of higher-level application-specific APIs. In this paper, we propose exploiting modern C++ metaprogramming features to add compile-time type-safety to improve the interaction with a well-defined metadata-rich scientific schema in domain-specific hierarchical datasets. We tackle two aspects of common use: (i) direct data access, (ii) flexible “in-memory” index models for efficient search and data processing. The proposed APIs use C++17’s template type auto deduction features, C++11’s enum class for type-safety and C-style preprocessor macros for generative templated code. We showcase the pros and cons of our initial work on the standard NeXus schema used for annotating and storing experimental neutron scattering data at several facilities around the world on top of HDF5. Extendable compile-time type-safe APIs are a desirable feature that could be indexed by any modern integrated development environment (IDE). Hence, such APIs can help ease the learning curve for domain scientists using a less error-prone software interaction to enhance the findability of their data without resorting to a domain-specific language (DSL).

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1899022
Country of Publication:
United States
Language:
English

References (25)

ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management journal July 2020
Miqs conference November 2019
The NeXus data format journal January 2015
Domain specific language implementation via compile-time meta-programming journal October 2008
Template based HDF5 satellite digital data product generation software conference July 2012
Insights for exascale IO APIs from building a petascale IO API conference January 2013
The UNIX time-sharing system journal July 1974
Performance Improvements on SNS and HFIR Instrument Data Reduction Workflows Using Mantid book January 2020
Software Development for Reproducible Research journal July 2013
NetCDF: an interface for scientific data access journal July 1990
An overview of the HDF5 technology suite and its applications conference January 2011
Improving access to multi-dimensional self-describing scientific datasets conference January 2003
Abstracting the template instantiation relation in C++ conference September 2009
The ANTAREX domain specific language for high performance computing journal July 2019
A Brief Introduction on Big Data 5Vs Characteristics and Hadoop Technology journal January 2015
Scientific data management in the coming decade journal December 2005
Armadillo: a template-based C++ library for linear algebra journal June 2016
ROOT — A C++ framework for petabyte data storage, statistical analysis and visualization journal December 2009
The FAIR Guiding Principles for scientific data management and stewardship journal March 2016
Exploring Metadata Search Essentials for Scientific Data Management conference December 2019
ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems journal January 2020
Guidelines for a Standardized Filesystem Layout for Scientific Data journal April 2020
Efficient loading of reduced data ensembles produced at ORNL SNS/HFIR neutron time-of-flight facilities conference December 2021
Expressing and Applying C++ Code Transformations for the HDF5 API Through a DSL book January 2017
Efficient Data Management in Neutron Scattering Data Reduction Workflows at ORNL conference December 2020

Similar Records

HDF5-FastQuery: An API for Simplifying Access to Data Storage,Retrieval, Indexing and Querying
Technical Report · 2006 · OSTI ID:888964

DARMA v. Beta 0.5
Software · 2017 · OSTI ID:1349227

Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs
Conference · 2012 · OSTI ID:1080421

Related Subjects