skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC)

Abstract

For many applications, clustering is a crucial step in order to gain insight into the makeup of a dataset. The best approach to a given problem often depends on a variety of factors, such as the size of the dataset, time restrictions, and soft clustering requirements. The HMAC algorithm seeks to combine the strengths of 2 particular clustering approaches: model-based and linkage-based clustering. One particular weakness of HMAC is its computational complexity. HMAC is not practical for mega-scale data clustering. For high-definition imagery, a user would have to wait months or years for a result; for a 16-megapixel image, the estimated runtime skyrockets to over a decade! To improve the execution time of HMAC, it is reasonable to consider an multi-core implementation that utilizes available system resources. An existing imple-mentation (Ray and Cheng 2014) divides the dataset into N partitions - one for each thread prior to executing the HMAC algorithm. This implementation benefits from 2 types of optimization: parallelization and divide-and-conquer. By running each partition in parallel, the program is able to accelerate computation by utilizing more system resources. Although the parallel implementation provides considerable improvement over the serial HMAC, it still suffers from poor computational complexity, O(N2). Oncemore » the maximum number of cores on a system is exhausted, the program exhibits slower behavior. We now consider a modification to HMAC that involves a recursive partitioning scheme. Our modification aims to exploit divide-and-conquer benefits seen by the parallel HMAC implementation. At each level in the recursion tree, partitions are divided into 2 sub-partitions until a threshold size is reached. When the partition can no longer be divided without falling below threshold size, the base HMAC algorithm is applied. This results in a significant speedup over the parallel HMAC.« less

Authors:
; ;  [1]
  1. Computational Sciences and Engineering Division, ORNL, Oak Ridge, TN 37830
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
Contributing Org.:
Dilip R. Patlolla and Sujithkumar Surendran Nair
OSTI Identifier:
1365649
Report Number(s):
HPAR HMAC; 005334WKSTN00
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Software
Software Revision:
00
Software Package Number:
005334
Software CPU:
WKSTN
Open Source:
Yes
Source Code Available:
Yes
Country of Publication:
United States

Citation Formats

Patlolla, Dilip R, Surendran Nair, Sujithkumar, and Graves, Daniel A. High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC). Computer software. https://www.osti.gov//servlets/purl/1365649. Vers. 00. USDOE. 12 Jan. 2017. Web.
Patlolla, Dilip R, Surendran Nair, Sujithkumar, & Graves, Daniel A. (2017, January 12). High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC) (Version 00) [Computer software]. https://www.osti.gov//servlets/purl/1365649.
Patlolla, Dilip R, Surendran Nair, Sujithkumar, and Graves, Daniel A. High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC). Computer software. Version 00. January 12, 2017. https://www.osti.gov//servlets/purl/1365649.
@misc{osti_1365649,
title = {High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC), Version 00},
author = {Patlolla, Dilip R and Surendran Nair, Sujithkumar and Graves, Daniel A.},
abstractNote = {For many applications, clustering is a crucial step in order to gain insight into the makeup of a dataset. The best approach to a given problem often depends on a variety of factors, such as the size of the dataset, time restrictions, and soft clustering requirements. The HMAC algorithm seeks to combine the strengths of 2 particular clustering approaches: model-based and linkage-based clustering. One particular weakness of HMAC is its computational complexity. HMAC is not practical for mega-scale data clustering. For high-definition imagery, a user would have to wait months or years for a result; for a 16-megapixel image, the estimated runtime skyrockets to over a decade! To improve the execution time of HMAC, it is reasonable to consider an multi-core implementation that utilizes available system resources. An existing imple-mentation (Ray and Cheng 2014) divides the dataset into N partitions - one for each thread prior to executing the HMAC algorithm. This implementation benefits from 2 types of optimization: parallelization and divide-and-conquer. By running each partition in parallel, the program is able to accelerate computation by utilizing more system resources. Although the parallel implementation provides considerable improvement over the serial HMAC, it still suffers from poor computational complexity, O(N2). Once the maximum number of cores on a system is exhausted, the program exhibits slower behavior. We now consider a modification to HMAC that involves a recursive partitioning scheme. Our modification aims to exploit divide-and-conquer benefits seen by the parallel HMAC implementation. At each level in the recursion tree, partitions are divided into 2 sub-partitions until a threshold size is reached. When the partition can no longer be divided without falling below threshold size, the base HMAC algorithm is applied. This results in a significant speedup over the parallel HMAC.},
url = {https://www.osti.gov//servlets/purl/1365649},
doi = {},
year = {Thu Jan 12 00:00:00 EST 2017},
month = {Thu Jan 12 00:00:00 EST 2017},
note =
}

Software:
To order this software, request consultation services, or receive further information, please fill out the following request.

Save / Share:

To initiate an order for this software, request consultation services, or receive further information, fill out the request form below. You may also reach us by email at: .

OSTI staff will begin to process an order for scientific and technical software once the payment and signed site license agreement are received. If the forms are not in order, OSTI will contact you. No further action will be taken until all required information and/or payment is received. Orders are usually processed within three to five business days.

Software Request

(required)
(required)
(required)
(required)
(required)
(required)
(required)
(required)