skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC)

Abstract

For many applications, clustering is a crucial step in order to gain insight into the makeup of a dataset. The best approach to a given problem often depends on a variety of factors, such as the size of the dataset, time restrictions, and soft clustering requirements. The HMAC algorithm seeks to combine the strengths of 2 particular clustering approaches: model-based and linkage-based clustering. One particular weakness of HMAC is its computational complexity. HMAC is not practical for mega-scale data clustering. For high-definition imagery, a user would have to wait months or years for a result; for a 16-megapixel image, the estimated runtime skyrockets to over a decade! To improve the execution time of HMAC, it is reasonable to consider an multi-core implementation that utilizes available system resources. An existing imple-mentation (Ray and Cheng 2014) divides the dataset into N partitions - one for each thread prior to executing the HMAC algorithm. This implementation benefits from 2 types of optimization: parallelization and divide-and-conquer. By running each partition in parallel, the program is able to accelerate computation by utilizing more system resources. Although the parallel implementation provides considerable improvement over the serial HMAC, it still suffers from poor computational complexity, O(N2). Oncemore » the maximum number of cores on a system is exhausted, the program exhibits slower behavior. We now consider a modification to HMAC that involves a recursive partitioning scheme. Our modification aims to exploit divide-and-conquer benefits seen by the parallel HMAC implementation. At each level in the recursion tree, partitions are divided into 2 sub-partitions until a threshold size is reached. When the partition can no longer be divided without falling below threshold size, the base HMAC algorithm is applied. This results in a significant speedup over the parallel HMAC.« less

Authors:
; ;  [1]
  1. Computational Sciences and Engineering Division, ORNL, Oak Ridge, TN 37830
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
Contributing Org.:
Dilip R. Patlolla and Sujithkumar Surendran Nair
OSTI Identifier:
1365649
Report Number(s):
HPAR HMAC; 005334WKSTN00
DOE Contract Number:
AC05-00OR22725
Resource Type:
Software
Software Revision:
00
Software Package Number:
005334
Software CPU:
WKSTN
Open Source:
Yes
Source Code Available:
Yes
Country of Publication:
United States

Citation Formats

Patlolla, Dilip R, Surendran Nair, Sujithkumar, and Graves, Daniel A. High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC). Computer software. https://www.osti.gov//servlets/purl/1365649. Vers. 00. USDOE. 12 Jan. 2017. Web.
Patlolla, Dilip R, Surendran Nair, Sujithkumar, & Graves, Daniel A. (2017, January 12). High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC) (Version 00) [Computer software]. https://www.osti.gov//servlets/purl/1365649.
Patlolla, Dilip R, Surendran Nair, Sujithkumar, and Graves, Daniel A. High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC). Computer software. Version 00. January 12, 2017. https://www.osti.gov//servlets/purl/1365649.
@misc{osti_1365649,
title = {High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC), Version 00},
author = {Patlolla, Dilip R and Surendran Nair, Sujithkumar and Graves, Daniel A.},
abstractNote = {For many applications, clustering is a crucial step in order to gain insight into the makeup of a dataset. The best approach to a given problem often depends on a variety of factors, such as the size of the dataset, time restrictions, and soft clustering requirements. The HMAC algorithm seeks to combine the strengths of 2 particular clustering approaches: model-based and linkage-based clustering. One particular weakness of HMAC is its computational complexity. HMAC is not practical for mega-scale data clustering. For high-definition imagery, a user would have to wait months or years for a result; for a 16-megapixel image, the estimated runtime skyrockets to over a decade! To improve the execution time of HMAC, it is reasonable to consider an multi-core implementation that utilizes available system resources. An existing imple-mentation (Ray and Cheng 2014) divides the dataset into N partitions - one for each thread prior to executing the HMAC algorithm. This implementation benefits from 2 types of optimization: parallelization and divide-and-conquer. By running each partition in parallel, the program is able to accelerate computation by utilizing more system resources. Although the parallel implementation provides considerable improvement over the serial HMAC, it still suffers from poor computational complexity, O(N2). Once the maximum number of cores on a system is exhausted, the program exhibits slower behavior. We now consider a modification to HMAC that involves a recursive partitioning scheme. Our modification aims to exploit divide-and-conquer benefits seen by the parallel HMAC implementation. At each level in the recursion tree, partitions are divided into 2 sub-partitions until a threshold size is reached. When the partition can no longer be divided without falling below threshold size, the base HMAC algorithm is applied. This results in a significant speedup over the parallel HMAC.},
url = {https://www.osti.gov//servlets/purl/1365649},
doi = {},
year = {Thu Jan 12 00:00:00 EST 2017},
month = {Thu Jan 12 00:00:00 EST 2017},
note =
}

Software:
To order this software, request consultation services, or receive further information, please fill out the following request.

Save / Share:
  • Recent advances in both computational hardware and multidisciplinary science have given rise to an unprecedented level of complexity in scientific simulation software. This paper describes an ongoing grass roots effort aimed at addressing complexity in high-performance computing through the use of Component-Based Software Engineering (CBSE). Highlights of the benefits and accomplishments of the Common Component Architecture (CCA) Forum and SciDAC ISIC are given, followed by an illustrative example of how the CCA has been applied to drive scientific discovery in quantum chemistry. Thrusts for future research are also described briefly.
  • xSim is a simulation-based performance investigation toolkit that permits running high-performance computing (HPC) applications in a controlled environment with millions of concurrent execution threads, while observing application performance in a simulated extreme-scale system for hardware/software co-design. The presented work details newly developed features for xSim that permit the injection of MPI process failures, the propagation/detection/notification of such failures within the simulation, and their handling using application-level checkpoint/restart. These new capabilities enable the observation of application behavior and performance under failure within a simulated future-generation HPC system using the most common fault handling technique.
  • High-speed wide area networks are expected to enable innovative applications that integrate geographically distributed, high-performance computing, database, graphics, and networking resources. However, there is as yet little understanding of the higher-level services required to support these applications, or of the techniques required to implement these services in a scalable, secure manner. We report on a large-scale protolyping effort that has yielded some insights into these issues. Building on the hardware base provided by the I-WAY, a national-scale Asynchronous Transfer Mode (ATM) network, we developed an integrated management and application programming system, called I-Soft. This system was deployed at most ofmore » the 17 I-WAY sites and used by many of the 60 applications demonstrated on the I-WAY network. In this article, we describe the I-Soft design and report on lessons learned from application experiments.« less
  • The absence of unbiased and up to date comparative evaluations of high-performance computing software complicates a user`s search for the appropriate software package. The National HPCC Software Exchange (NHSE) is attacking this problem using an approach that includes independent evaluations of software, incorporation of author and user feedback into the evaluations, and Web access to the evaluations. We are applying this approach to the Parallel Tools Library (PTLIB), a new software repository for parallel systems software and tools, and HPC-Netlib, a high performance branch of the Netlib mathematical software repository. Updating the evaluations with feed-back and making it available viamore » the Web helps ensure accuracy and timeliness, and using independent reviewers produces unbiased comparative evaluations difficult to find elsewhere.« less

To initiate an order for this software, request consultation services, or receive further information, fill out the request form below. You may also reach us by email at: .

OSTI staff will begin to process an order for scientific and technical software once the payment and signed site license agreement are received. If the forms are not in order, OSTI will contact you. No further action will be taken until all required information and/or payment is received. Orders are usually processed within three to five business days.

Software Request

(required)
(required)
(required)
(required)
(required)
(required)
(required)
(required)