DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Streaming data analytics via message passing with application to graph algorithms

Abstract

The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of either message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.

Authors:
 [1];  [1]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1062811
Report Number(s):
SAND2012-9495J
Journal ID: ISSN 0743-7315; PII: S0743731514000884
Grant/Contract Number:  
AC04-94AL85000
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Parallel and Distributed Computing
Additional Journal Information:
Journal Volume: 74; Journal Issue: 8; Journal ID: ISSN 0743-7315
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Streaming data; Graph algorithms; Message passing; MPI; Sockets; MapReduce

Citation Formats

Plimpton, Steven J., and Shead, Tim. Streaming data analytics via message passing with application to graph algorithms. United States: N. p., 2014. Web. doi:10.1016/j.jpdc.2014.04.001.
Plimpton, Steven J., & Shead, Tim. Streaming data analytics via message passing with application to graph algorithms. United States. https://doi.org/10.1016/j.jpdc.2014.04.001
Plimpton, Steven J., and Shead, Tim. Tue . "Streaming data analytics via message passing with application to graph algorithms". United States. https://doi.org/10.1016/j.jpdc.2014.04.001. https://www.osti.gov/servlets/purl/1062811.
@article{osti_1062811,
title = {Streaming data analytics via message passing with application to graph algorithms},
author = {Plimpton, Steven J. and Shead, Tim},
abstractNote = {The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of either message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.},
doi = {10.1016/j.jpdc.2014.04.001},
journal = {Journal of Parallel and Distributed Computing},
number = 8,
volume = 74,
place = {United States},
year = {Tue May 06 00:00:00 EDT 2014},
month = {Tue May 06 00:00:00 EDT 2014}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 10 works
Citation information provided by
Web of Science

Save / Share: