Seismology in the Cloud: A New Streaming Workflow
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Incorporated Research Inst. for Seismology (IRIS) Data Services, Seattle, WA (Unites States)
Data-intensive research in seismology is experiencing a recent boom, driven in part by large volumes of available data and advances in the growing field of data science. However, there are significant barriers to processing large data volumes, such as long retrieval times from data repositories, complex data management, and limited computational resources. New tools and platforms have reduced the barriers to entry for scientific cluster computing, including the maturation of the commercial cloud as an accessible instrument for research. Here, we build a customized research cluster in the cloud to test a new workflow for large-scale seismic analysis, in which data are processed as a stream (retrieved on-the-fly and acted upon without storing), with data from the Incorporated Research Institutions for Seismology Data Management Center. We use this workflow to deploy a spectral peak detection algorithm over 5.6 TB of compressed continuous seismic data from 2074 stations of the USArray Transportable Array EarthScope network. Using a 50-node cluster in the cloud, we completed the noise survey in 80 hr, with an average data throughput of 1.7 GB per minute. By varying cluster sizes, we find the scaling of our analysis to be sublinear, due to a combination of algorithmic limitations and data center response times. The cloud-based streaming workflow represents an order-of-magnitude increase in acquisition and processing speed compared to a traditional download-store-process workflow, and offers the additional benefits of employing a flexible, accessible, and widely used computing architecture. It is limited, however, due to its reliance on Internet transfer speeds and data center service capacity, and may not work well for repeated analyses or those for which even higher data throughputs are needed. These research applications will require a new class of cloud-native approaches in which both data and analysis are in the cloud.
- Research Organization:
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA), Office of Defense Nuclear Nonproliferation
- Grant/Contract Number:
- 89233218CNA000001
- OSTI ID:
- 1688754
- Report Number(s):
- LA-UR-19-31275
- Journal Information:
- Seismological Research Letters, Vol. 91, Issue 3; ISSN 0895-0695
- Publisher:
- Seismological Society of AmericaCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
EarthScope Seismic Station A21K (A21K-6) Field Campaign Report
Conquering Data Chaos: Research Data Management with Kubernetes