Scalable Metadata Management for a Large Multi-Source Seismic Data Repository

Gaylord, J. M.; Dodge, D. A.; Magana-Zook, S. A.; Barno, J. G.; Knapp, D. R.

doi:10.2172/1357348

Title: Scalable Metadata Management for a Large Multi-Source Seismic Data Repository

Technical Report · Tue Apr 11 00:00:00 EDT 2017

DOI:https://doi.org/10.2172/1357348· OSTI ID:1357348

Gaylord, J. M. ^[1]; Dodge, D. A. ^[1]; Magana-Zook, S. A. ^[1]; Barno, J. G. ^[1]; Knapp, D. R. ^[1]

Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

In this work, we implemented the key metadata management components of a scalable seismic data ingestion framework to address limitations in our existing system, and to position it for anticipated growth in volume and complexity. We began the effort with an assessment of open source data flow tools from the Hadoop ecosystem. We then began the construction of a layered architecture that is specifically designed to address many of the scalability and data quality issues we experience with our current pipeline. This included implementing basic functionality in each of the layers, such as establishing a data lake, designing a unified metadata schema, tracking provenance, and calculating data quality metrics. Our original intent was to test and validate the new ingestion framework with data from a large-scale field deployment in a temporary network. This delivered somewhat unsatisfying results, since the new system immediately identified fatal flaws in the data relatively early in the pipeline. Although this is a correct result it did not allow us to sufficiently exercise the whole framework. We then widened our scope to process all available metadata from over a dozen online seismic data sources to further test the implementation and validate the design. This experiment also uncovered a higher than expected frequency of certain types of metadata issues that challenged us to further tune our data management strategy to handle them. Our result from this project is a greatly improved understanding of real world data issues, a validated design, and prototype implementations of major components of an eventual production framework. This successfully forms the basis of future development for the Geophysical Monitoring Program data pipeline, which is a critical asset supporting multiple programs. It also positions us very well to deliver valuable metadata management expertise to our sponsors, and has already resulted in an NNSA Office of Defense Nuclear Nonproliferation commitment to a multi-year project for follow-on work.

View Technical Report

Cite

Export

Save

Research Organization:: Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC52-07NA27344

OSTI ID:: 1357348

Report Number(s):: LLNL-TR-729885

Country of Publication:: United States

Language:: English

Similar Records

ENHANCING SEISMIC CALIBRATION RESEARCH THROUGH SOFTWARE AUTOMATION AND SCIENTIFIC INFORMATION MANAGEMENT

Conference · Thu Jul 03 00:00:00 EDT 2008 · OSTI ID:1357348

Ruppert, S; Dodge, D A; Ganzberger, M D; +2 more

ENHANCING SEISMIC CALIBRATION RESEARCH THROUGH SOFTWARE AUTOMATION AND SCIENTIFIC INFORMATION MANAGEMENT

Conference · Fri Jul 06 00:00:00 EDT 2007 · OSTI ID:1357348

Ruppert, S D; Dodge, D A; Ganzberger, M D; +2 more

A Flexible Online Metadata Editing and Management System

Journal Article · Fri Jan 01 00:00:00 EST 2010 · Ecological Informatics · OSTI ID:1357348

Aguilar, Raul; Pan, Jerry Yun; Gries, Corinna; +2 more

Related Subjects

58 GEOSCIENCES
97 MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE

Title: Scalable Metadata Management for a Large Multi-Source Seismic Data Repository

Citation Formats

Similar Records

Related Subjects