Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Streaming data analytics via message passing with application to graph algorithms

Journal Article · · Journal of Parallel and Distributed Computing
 [1];  [1]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of either message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC04-94AL85000
OSTI ID:
1062811
Report Number(s):
SAND2012--9495J; PII: S0743731514000884
Journal Information:
Journal of Parallel and Distributed Computing, Journal Name: Journal of Parallel and Distributed Computing Journal Issue: 8 Vol. 74; ISSN 0743-7315
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (14)

Inexact subgraph isomorphism in MapReduce journal February 2013
MapReduce in MPI for Large-scale graph algorithms journal September 2011
S4: Distributed Stream Computing Platform conference December 2010
Graph Twiddling in a MapReduce World journal July 2009
A unified toolkit for information and scientific visualization conference January 2009
Advances in dataflow programming languages journal March 2004
A platform for scalable one-pass analytics using MapReduce conference June 2011
Maintaining connected components for infinite graph streams
  • Berry, Jonathan; Oster, Matthew; Phillips, Cynthia A.
  • Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications https://doi.org/10.1145/2501221.2501234
conference August 2013
R-MAT: A Recursive Model for Graph Mining text January 2018
MapReduce: Simplified Data Processing on Large Cluster journal April 2018
S4: Distributed Stream Computing Platform conference December 2010
Graph Twiddling in a MapReduce World journal July 2009
R-MAT: A Recursive Model for Graph Mining conference December 2013
MapReduce: simplified data processing on large clusters journal January 2008

Similar Records

Parallel Harness for Informatic Stream Hashing
Software · 2012 · OSTI ID:1231551

Parallel Harness for Informatic Stream Hashing
Software · 2012 · OSTI ID:code-6196

MapReduce MPI library (MR-MPI) v. 1.0
Software · 2009 · OSTI ID:1253283