skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Mr. Scan: extreme scale density-based clustering using a tree-based network of GPGPU nodes, In: SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Conference · · 2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC)

Density-based clustering algorithms are a widely-used class of data mining techniques that can find irregularly shaped clusters and cluster data without prior knowledge of the number of clusters it contains. DBSCAN is the most well-known density-based clustering algorithm. We introduce our version of DBSCAN, called Mr. Scan, which uses a hybrid parallel implementation that combines the MRNet tree-based distribution network with GPGPU-equipped nodes. Mr. Scan avoids the problems of existing implementations by effectively partitioning the point space and by optimizing DBSCAN's computation over dense data regions. We tested Mr. Scan on both a geolocated Twitter dataset and image data obtained from the Sloan Digital Sky Survey. At its largest scale, Mr. Scan clustered 6.5 billion points from the Twitter dataset on 8,192 GPU nodes on Cray Titan in 17.3 minutes. All other parallel DBSCAN implementations have only demonstrated the ability to cluster up to 100 million points.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Organization:
USDOE Office of Science (SC)
OSTI ID:
1567345
Journal Information:
2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), Conference: International Conference on High Performance Computing, Networking, Storage and Analysis, Denver, Colorado, November 17-21, 2013
Country of Publication:
United States
Language:
English

Similar Records

The Anatomy of Mr. Scan: A Dissection of Performance of an Extreme Scale GPU-Based Clustering Algorithm, In: 2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
Conference · Sat Nov 01 00:00:00 EDT 2014 · 2014 5TH WORKSHOP ON LATEST ADVANCES IN SCALABLE ALGORITHMS FOR LARGE-SCALE SYSTEMS (SCALA) · OSTI ID:1567345

Integrated genome-based studies of Shewanella ecophysiology
Technical Report · Tue Feb 14 00:00:00 EST 2012 · OSTI ID:1567345

Data depth based clustering analysis
Conference · Fri Jan 01 00:00:00 EST 2016 · OSTI ID:1567345

Related Subjects