DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: HiSpatialCluster: A novel high‐performance software tool for clustering massive spatial points

Abstract

Abstract In the era of big data, spatial clustering is a very important means for geo‐data analysis. When clustering big geo‐data such as social media check‐in data, geotagged photos, and taxi trajectory points, traditional spatial clustering algorithms are facing more challenges. On the one hand, existing spatial clustering tools cannot support the clustering of massive point sets; on the other hand, there is no perfect solution for self‐adaptive spatial clustering. In order to achieve clustering of millions or even billions of points adaptively, a new spatial clustering tool—HiSpatialCluster—was proposed, in which the CFSFDP (clustering by fast search and finding density peaks) idea to find cluster centers and the DBSCAN (density‐based spatial clustering of applications with noise) idea of density‐connect filtering for classification are introduced. The tool’s source codes and other resources have been released on Github, and experimental evaluation was performed through clustering massive taxi trajectory points and Flickr geotagged photos in Beijing, China. The spatial clustering results were compared with those through K‐means and DBSCAN as well. As a spatial clustering tool, HiSpatialCluster is expected to play a fundamental role in big geo‐data research. First, this tool enables clustering adaptively on massive point datasets with uneven spatial density distribution.more » Second, the density‐connect filter method is applied to generate homogeneous analysis units from geotagged data. Third, the tool is accelerated by both parallel CPU and GPU computing so that millions or even billions of points can be clustered efficiently.« less

Authors:
ORCiD logo [1];  [1];  [2];  [1]
  1. Institute of Remote Sensing and Geographical Information Systems Peking University Beijing China, Beijing Key Lab of Spatial Information Integration &, Its Applications Peking University Beijing China
  2. State Key Laboratory of Resources and Environmental Information System Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences Beijing China
Publication Date:
Sponsoring Org.:
USDOE
OSTI Identifier:
1466059
Resource Type:
Publisher's Accepted Manuscript
Journal Name:
Transactions in GIS
Additional Journal Information:
Journal Name: Transactions in GIS Journal Volume: 22 Journal Issue: 5; Journal ID: ISSN 1361-1682
Publisher:
Wiley-Blackwell
Country of Publication:
Country unknown/Code not available
Language:
English

Citation Formats

Chen, Yiran, Huang, Zhou, Pei, Tao, and Liu, Yu. HiSpatialCluster: A novel high‐performance software tool for clustering massive spatial points. Country unknown/Code not available: N. p., 2018. Web. doi:10.1111/tgis.12463.
Chen, Yiran, Huang, Zhou, Pei, Tao, & Liu, Yu. HiSpatialCluster: A novel high‐performance software tool for clustering massive spatial points. Country unknown/Code not available. https://doi.org/10.1111/tgis.12463
Chen, Yiran, Huang, Zhou, Pei, Tao, and Liu, Yu. Wed . "HiSpatialCluster: A novel high‐performance software tool for clustering massive spatial points". Country unknown/Code not available. https://doi.org/10.1111/tgis.12463.
@article{osti_1466059,
title = {HiSpatialCluster: A novel high‐performance software tool for clustering massive spatial points},
author = {Chen, Yiran and Huang, Zhou and Pei, Tao and Liu, Yu},
abstractNote = {Abstract In the era of big data, spatial clustering is a very important means for geo‐data analysis. When clustering big geo‐data such as social media check‐in data, geotagged photos, and taxi trajectory points, traditional spatial clustering algorithms are facing more challenges. On the one hand, existing spatial clustering tools cannot support the clustering of massive point sets; on the other hand, there is no perfect solution for self‐adaptive spatial clustering. In order to achieve clustering of millions or even billions of points adaptively, a new spatial clustering tool—HiSpatialCluster—was proposed, in which the CFSFDP (clustering by fast search and finding density peaks) idea to find cluster centers and the DBSCAN (density‐based spatial clustering of applications with noise) idea of density‐connect filtering for classification are introduced. The tool’s source codes and other resources have been released on Github, and experimental evaluation was performed through clustering massive taxi trajectory points and Flickr geotagged photos in Beijing, China. The spatial clustering results were compared with those through K‐means and DBSCAN as well. As a spatial clustering tool, HiSpatialCluster is expected to play a fundamental role in big geo‐data research. First, this tool enables clustering adaptively on massive point datasets with uneven spatial density distribution. Second, the density‐connect filter method is applied to generate homogeneous analysis units from geotagged data. Third, the tool is accelerated by both parallel CPU and GPU computing so that millions or even billions of points can be clustered efficiently.},
doi = {10.1111/tgis.12463},
journal = {Transactions in GIS},
number = 5,
volume = 22,
place = {Country unknown/Code not available},
year = {Wed Aug 22 00:00:00 EDT 2018},
month = {Wed Aug 22 00:00:00 EDT 2018}
}

Works referenced in this record:

A new approach to the nearest‐neighbour method to discover cluster features in overlaid spatial point processes
journal, February 2006

  • Pei, Tao; Zhu, A‐Xing; Zhou, Chenghu
  • International Journal of Geographical Information Science, Vol. 20, Issue 2
  • DOI: 10.1080/13658810500399654

ST-DBSCAN: An algorithm for clustering spatial–temporal data
journal, January 2007


Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data
conference, December 2013

  • Ertöz, Levent; Steinbach, Michael; Kumar, Vipin
  • Proceedings of the 2003 SIAM International Conference on Data Mining
  • DOI: 10.1137/1.9781611972733.5

Detecting tourism destinations using scalable geospatial analysis based on cloud computing platform
journal, November 2015


CURE: an efficient clustering algorithm for large databases
journal, June 1998

  • Guha, Sudipto; Rastogi, Rajeev; Shim, Kyuseok
  • ACM SIGMOD Record, Vol. 27, Issue 2
  • DOI: 10.1145/276305.276312

Multi-scale decomposition of point process data
journal, August 2012


ACOMCD: A multiple cluster detection algorithm based on the spatial scan statistic and ant colony optimization
journal, February 2012


Argument free clustering for large spatial point-data sets via boundary extraction from Delaunay Diagram
journal, July 2002


Trajectory clustering: a partition-and-group framework
conference, January 2007

  • Lee, Jae-Gil; Han, Jiawei; Whang, Kyu-Young
  • Proceedings of the 2007 ACM SIGMOD international conference on Management of data - SIGMOD '07
  • DOI: 10.1145/1247480.1247546

OPTICS: ordering points to identify the clustering structure
journal, June 1999

  • Ankerst, Mihael; Breunig, Markus M.; Kriegel, Hans-Peter
  • ACM SIGMOD Record, Vol. 28, Issue 2
  • DOI: 10.1145/304181.304187

Exploration of geo-tagged photos through data mining approaches
journal, February 2014


Identifying points of interest by self-tuning clustering
conference, January 2011

  • Yang, Yiyang; Gong, Zhiguo; U., Leong Hou
  • Proceedings of the 34th international ACM SIGIR conference on Research and development in Information - SIGIR '11
  • DOI: 10.1145/2009916.2010034

P-DBSCAN: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos
conference, January 2010

  • Kisilevich, Slava; Mansmann, Florian; Keim, Daniel
  • Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application - COM.Geo '10
  • DOI: 10.1145/1823854.1823897

Clustering by fast search and find of density peaks
journal, June 2014


Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging
journal, February 2005

  • Cheng-Ru Lin,
  • IEEE Transactions on Knowledge and Data Engineering, Vol. 17, Issue 2
  • DOI: 10.1109/TKDE.2005.21

Discovering regions of different functions in a city using human mobility and POIs
conference, January 2012

  • Yuan, Jing; Zheng, Yu; Xie, Xing
  • Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '12
  • DOI: 10.1145/2339530.2339561

Exploring the travel behaviors of inbound tourists to Hong Kong using geotagged photos
journal, February 2015


Detecting feature from spatial point processes using Collective Nearest Neighbor
journal, November 2009


CLARANS: a method for clustering objects for spatial data mining
journal, September 2002


An adaptive spatial clustering algorithm based on delaunay triangulation
journal, July 2011


Mining city landmarks from blogs by graph modeling
conference, January 2009

  • Ji, Rongrong; Xie, Xing; Yao, Hongxun
  • Proceedings of the seventeen ACM international conference on Multimedia - MM '09
  • DOI: 10.1145/1631272.1631289

Mining Points-of-Interest Association Rules from Geo-tagged Photos
conference, January 2013

  • Lee, Ickjai; Cai, Guochen; Lee, Kyungmi
  • 2013 46th Hawaii International Conference on System Sciences (HICSS)
  • DOI: 10.1109/HICSS.2013.401