skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalable Metadata Management for a Large Multi-Source Seismic Data Repository

Technical Report ·
DOI:https://doi.org/10.2172/1357348· OSTI ID:1357348
 [1];  [1];  [1];  [1];  [1]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

In this work, we implemented the key metadata management components of a scalable seismic data ingestion framework to address limitations in our existing system, and to position it for anticipated growth in volume and complexity. We began the effort with an assessment of open source data flow tools from the Hadoop ecosystem. We then began the construction of a layered architecture that is specifically designed to address many of the scalability and data quality issues we experience with our current pipeline. This included implementing basic functionality in each of the layers, such as establishing a data lake, designing a unified metadata schema, tracking provenance, and calculating data quality metrics. Our original intent was to test and validate the new ingestion framework with data from a large-scale field deployment in a temporary network. This delivered somewhat unsatisfying results, since the new system immediately identified fatal flaws in the data relatively early in the pipeline. Although this is a correct result it did not allow us to sufficiently exercise the whole framework. We then widened our scope to process all available metadata from over a dozen online seismic data sources to further test the implementation and validate the design. This experiment also uncovered a higher than expected frequency of certain types of metadata issues that challenged us to further tune our data management strategy to handle them. Our result from this project is a greatly improved understanding of real world data issues, a validated design, and prototype implementations of major components of an eventual production framework. This successfully forms the basis of future development for the Geophysical Monitoring Program data pipeline, which is a critical asset supporting multiple programs. It also positions us very well to deliver valuable metadata management expertise to our sponsors, and has already resulted in an NNSA Office of Defense Nuclear Nonproliferation commitment to a multi-year project for follow-on work.

Research Organization:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC52-07NA27344
OSTI ID:
1357348
Report Number(s):
LLNL-TR-729885
Country of Publication:
United States
Language:
English