High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC)
- Computational Sciences and Engineering Division, ORNL, Oak Ridge, TN 37830
For many applications, clustering is a crucial step in order to gain insight into the makeup of a dataset. The best approach to a given problem often depends on a variety of factors, such as the size of the dataset, time restrictions, and soft clustering requirements. The HMAC algorithm seeks to combine the strengths of 2 particular clustering approaches: model-based and linkage-based clustering. One particular weakness of HMAC is its computational complexity. HMAC is not practical for mega-scale data clustering. For high-definition imagery, a user would have to wait months or years for a result; for a 16-megapixel image, the estimated runtime skyrockets to over a decade! To improve the execution time of HMAC, it is reasonable to consider an multi-core implementation that utilizes available system resources. An existing imple-mentation (Ray and Cheng 2014) divides the dataset into N partitions - one for each thread prior to executing the HMAC algorithm. This implementation benefits from 2 types of optimization: parallelization and divide-and-conquer. By running each partition in parallel, the program is able to accelerate computation by utilizing more system resources. Although the parallel implementation provides considerable improvement over the serial HMAC, it still suffers from poor computational complexity, O(N2). Once the maximum number of cores on a system is exhausted, the program exhibits slower behavior. We now consider a modification to HMAC that involves a recursive partitioning scheme. Our modification aims to exploit divide-and-conquer benefits seen by the parallel HMAC implementation. At each level in the recursion tree, partitions are divided into 2 sub-partitions until a threshold size is reached. When the partition can no longer be divided without falling below threshold size, the base HMAC algorithm is applied. This results in a significant speedup over the parallel HMAC.
- Short Name / Acronym:
- HPAR HMAC; 005334WKSTN00
- Version:
- 00
- Programming Language(s):
- Medium: X; OS: LINUX
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE
- Contributing Organization:
- Dilip R. Patlolla and Sujithkumar Surendran Nair
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1365649
- Country of Origin:
- United States
Similar Records
Extracting SIMD Parallelism from Recursive Task-Parallel Programs
Center for Technology for Advanced Scientific Componet Software (TASCS)