Fault Tolerant Frequent Pattern Mining

Shohdy, Sameh; Vishnu, Abhinav; Agrawal, Gagan

doi:10.1109/HiPC.2016.012

Title: Fault Tolerant Frequent Pattern Mining

Conference · Mon Dec 19 00:00:00 EST 2016

DOI:https://doi.org/10.1109/HiPC.2016.012· OSTI ID:1345459

Shohdy, Sameh; Vishnu, Abhinav; Agrawal, Gagan

FP-Growth algorithm is a Frequent Pattern Mining (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel parallel, algorithm-level fault-tolerant FP-Growth algorithm. We leverage algorithmic properties and MPI advanced features to guarantee an O(1) space complexity, achieved by using the dataset memory space itself for checkpointing. We also propose a recovery algorithm that can use in-memory and disk-based checkpointing, though in many cases the recovery can be completed without any disk access, and incurring no memory overhead for checkpointing. We evaluate our FT algorithm on a large scale InfiniBand cluster with several large datasets using up to 2K cores. Our evaluation demonstrates excellent efficiency for checkpointing and recovery in comparison to the disk-based approach. We have also observed 20x average speed-up in comparison to Spark, establishing that a well designed algorithm can easily outperform a solution based on a general fault-tolerant programming model.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 1345459

Report Number(s):: PNNL-SA-120772

Resource Relation:: Conference: IEEE 23rd International Conference on High Performance Computing (HiPC), December 19-22, 2016, Hyderabad, India, 12-21

Country of Publication:: United States

Language:: English

Similar Records

Designing a Scalable Fault Tolerance Model for High Performance Computational Chemistry: A Case Study with Coupled Cluster Perturbative Triples

Journal Article · Tue Jan 11 00:00:00 EST 2011 · Journal of Chemical Theory and Computation, 7(1):66-75 · OSTI ID:1345459

van Dam, Hubertus JJ; Vishnu, Abhinav; De Jong, Wibe A

Large Scale Frequent Pattern Mining using MPI One-Sided Model

Conference · Tue Sep 08 00:00:00 EDT 2015 · OSTI ID:1345459

Vishnu, Abhinav; Agarwal, Khushbu

Scalable Transparent Checkpoint-Restart of Global Address Space Applications on Virtual Machines over Infiniband

Conference · Mon May 18 00:00:00 EDT 2009 · OSTI ID:1345459

Villa, Oreste; Krishnamoorthy, Sriram; Nieplocha, Jaroslaw; +1 more

Title: Fault Tolerant Frequent Pattern Mining

Citation Formats

Similar Records

Related Subjects