skip to main content

DOE PAGESDOE PAGES

Title: Large-scale seismic signal analysis with Hadoop

In seismology, waveform cross correlation has been used for years to produce high-precision hypocenter locations and for sensitive detectors. Because correlated seismograms generally are found only at small hypocenter separation distances, correlation detectors have historically been reserved for spotlight purposes. However, many regions have been found to produce large numbers of correlated seismograms, and there is growing interest in building next-generation pipelines that employ correlation as a core part of their operation. In an effort to better understand the distribution and behavior of correlated seismic events, we have cross correlated a global dataset consisting of over 300 million seismograms. This was done using a conventional distributed cluster, and required 42 days. In anticipation of processing much larger datasets, we have re-architected the system to run as a series of MapReduce jobs on a Hadoop cluster. In doing so we achieved a factor of 19 performance increase on a test dataset. We found that fundamental algorithmic transformations were required to achieve the maximum performance increase. Whereas in the original IO-bound implementation, we went to great lengths to minimize IO, in the Hadoop implementation where IO is cheap, we were able to greatly increase the parallelism of our algorithms by performing amore » tiered series of very fine-grained (highly parallelizable) transformations on the data. Each of these MapReduce jobs required reading and writing large amounts of data.« less
Authors:
 [1] ;  [2] ;  [2] ;  [2]
  1. Google Inc., Mountain View, CA (United States)
  2. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Publication Date:
Grant/Contract Number:
AC52-07NA27344; LLNL-JRNL-644626
Type:
Published Article
Journal Name:
Computers and Geosciences
Additional Journal Information:
Journal Volume: 66; Journal Issue: C; Journal ID: ISSN 0098-3004
Publisher:
Elsevier
Research Org:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org:
USDOE
Country of Publication:
United States
Language:
English
Subject:
58 GEOSCIENCES; correlation; Hadoop; MapReduce; seismology
OSTI Identifier:
1209709
Alternate Identifier(s):
OSTI ID: 1201566