skip to main content

DOE PAGESDOE PAGES

Title: Streaming data analytics via message passing with application to graph algorithms

The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of either message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.
Authors:
 [1] ;  [1]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Publication Date:
Report Number(s):
SAND2012-9495J
Journal ID: ISSN 0743-7315; PII: S0743731514000884
Grant/Contract Number:
AC04-94AL85000
Type:
Accepted Manuscript
Journal Name:
Journal of Parallel and Distributed Computing
Additional Journal Information:
Journal Volume: 74; Journal Issue: 8; Journal ID: ISSN 0743-7315
Publisher:
Elsevier
Research Org:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org:
USDOE National Nuclear Security Administration (NNSA)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Streaming data; Graph algorithms; Message passing; MPI; Sockets; MapReduce
OSTI Identifier:
1062811

Plimpton, Steven J., and Shead, Tim. Streaming data analytics via message passing with application to graph algorithms. United States: N. p., Web. doi:10.1016/j.jpdc.2014.04.001.
Plimpton, Steven J., & Shead, Tim. Streaming data analytics via message passing with application to graph algorithms. United States. doi:10.1016/j.jpdc.2014.04.001.
Plimpton, Steven J., and Shead, Tim. 2014. "Streaming data analytics via message passing with application to graph algorithms". United States. doi:10.1016/j.jpdc.2014.04.001. https://www.osti.gov/servlets/purl/1062811.
@article{osti_1062811,
title = {Streaming data analytics via message passing with application to graph algorithms},
author = {Plimpton, Steven J. and Shead, Tim},
abstractNote = {The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of either message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.},
doi = {10.1016/j.jpdc.2014.04.001},
journal = {Journal of Parallel and Distributed Computing},
number = 8,
volume = 74,
place = {United States},
year = {2014},
month = {5}
}