Summary: Supporting XML Based High-Level Abstractions on HDF5
Datasets: A Case Study in Automatic Data Virtualization
Swarup Kumar Sahoo Gagan Agrawal
Department of Computer Science and Engineering
Ohio State University, Columbus OH 43210
Abstract. Recently, we have been focusing on the notion of automatic data virtualization.
The goal is to enable automatic creation of efficient data services to support a high-level or
virtual view of the data. The application developers express the processing assuming this vir-
tual view, whereas the data is stored in a low-level format. The compiler uses the information
about the low-level layout and the relationship between the virtual and the low-level layouts
to generate efficient low-level data processing code.
In this paper, we describe a specific implementation of this approach. We provide XML-based
abstractions on datasets stored in the Hierarchical Data Format (HDF). A high-level XML
Schema provides a logical view on the HDF5 dataset, hiding actual layout details. Based
on this view, the processing is specified using XQuery, which is the XML Query language
developed by the World Wide Web Consortium (W3C). The HDF5 data layout is exposed
to the compiler using low-level XML Schema. The relationship between the high-level and
low-level Schemas is exposed using a Mapping Schema.
We describe how our compiler can generate efficient code to access and process HDF5 datasets