Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC)

Software ·
OSTI ID:1365649
; ;  [1]
  1. Computational Sciences and Engineering Division, ORNL, Oak Ridge, TN 37830

For many applications, clustering is a crucial step in order to gain insight into the makeup of a dataset. The best approach to a given problem often depends on a variety of factors, such as the size of the dataset, time restrictions, and soft clustering requirements. The HMAC algorithm seeks to combine the strengths of 2 particular clustering approaches: model-based and linkage-based clustering. One particular weakness of HMAC is its computational complexity. HMAC is not practical for mega-scale data clustering. For high-definition imagery, a user would have to wait months or years for a result; for a 16-megapixel image, the estimated runtime skyrockets to over a decade! To improve the execution time of HMAC, it is reasonable to consider an multi-core implementation that utilizes available system resources. An existing imple-mentation (Ray and Cheng 2014) divides the dataset into N partitions - one for each thread prior to executing the HMAC algorithm. This implementation benefits from 2 types of optimization: parallelization and divide-and-conquer. By running each partition in parallel, the program is able to accelerate computation by utilizing more system resources. Although the parallel implementation provides considerable improvement over the serial HMAC, it still suffers from poor computational complexity, O(N2). Once the maximum number of cores on a system is exhausted, the program exhibits slower behavior. We now consider a modification to HMAC that involves a recursive partitioning scheme. Our modification aims to exploit divide-and-conquer benefits seen by the parallel HMAC implementation. At each level in the recursion tree, partitions are divided into 2 sub-partitions until a threshold size is reached. When the partition can no longer be divided without falling below threshold size, the base HMAC algorithm is applied. This results in a significant speedup over the parallel HMAC.

Short Name / Acronym:
HPAR HMAC; 005334WKSTN00
Version:
00
Programming Language(s):
Medium: X; OS: LINUX
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
Contributing Organization:
Dilip R. Patlolla and Sujithkumar Surendran Nair
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1365649
Country of Origin:
United States

Similar Records

A parallel divide and conquer algorithm for the symmetric eigenvalue problem on distributed memory architectures
Journal Article · Thu Jul 01 00:00:00 EDT 1999 · SIAM Journal on Scientific Computing · OSTI ID:20005552

Efficient Execution of Recursive Programs on Commodity Vector Hardware
Conference · Sat Jun 13 00:00:00 EDT 2015 · OSTI ID:1194297

CilkSpec: Optimistic Concurrency for Cilk
Conference · Sat Nov 14 23:00:00 EST 2015 · OSTI ID:1236312

Related Subjects