HiSpatialCluster: A novel high‐performance software tool for clustering massive spatial points
Abstract
Abstract In the era of big data, spatial clustering is a very important means for geo‐data analysis. When clustering big geo‐data such as social media check‐in data, geotagged photos, and taxi trajectory points, traditional spatial clustering algorithms are facing more challenges. On the one hand, existing spatial clustering tools cannot support the clustering of massive point sets; on the other hand, there is no perfect solution for self‐adaptive spatial clustering. In order to achieve clustering of millions or even billions of points adaptively, a new spatial clustering tool—HiSpatialCluster—was proposed, in which the CFSFDP (clustering by fast search and finding density peaks) idea to find cluster centers and the DBSCAN (density‐based spatial clustering of applications with noise) idea of density‐connect filtering for classification are introduced. The tool’s source codes and other resources have been released on Github, and experimental evaluation was performed through clustering massive taxi trajectory points and Flickr geotagged photos in Beijing, China. The spatial clustering results were compared with those through K‐means and DBSCAN as well. As a spatial clustering tool, HiSpatialCluster is expected to play a fundamental role in big geo‐data research. First, this tool enables clustering adaptively on massive point datasets with uneven spatial density distribution.more »
- Authors:
-
- Institute of Remote Sensing and Geographical Information Systems Peking University Beijing China, Beijing Key Lab of Spatial Information Integration &, Its Applications Peking University Beijing China
- State Key Laboratory of Resources and Environmental Information System Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences Beijing China
- Publication Date:
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1466059
- Resource Type:
- Publisher's Accepted Manuscript
- Journal Name:
- Transactions in GIS
- Additional Journal Information:
- Journal Name: Transactions in GIS Journal Volume: 22 Journal Issue: 5; Journal ID: ISSN 1361-1682
- Publisher:
- Wiley-Blackwell
- Country of Publication:
- Country unknown/Code not available
- Language:
- English
Citation Formats
Chen, Yiran, Huang, Zhou, Pei, Tao, and Liu, Yu. HiSpatialCluster: A novel high‐performance software tool for clustering massive spatial points. Country unknown/Code not available: N. p., 2018.
Web. doi:10.1111/tgis.12463.
Chen, Yiran, Huang, Zhou, Pei, Tao, & Liu, Yu. HiSpatialCluster: A novel high‐performance software tool for clustering massive spatial points. Country unknown/Code not available. https://doi.org/10.1111/tgis.12463
Chen, Yiran, Huang, Zhou, Pei, Tao, and Liu, Yu. Wed .
"HiSpatialCluster: A novel high‐performance software tool for clustering massive spatial points". Country unknown/Code not available. https://doi.org/10.1111/tgis.12463.
@article{osti_1466059,
title = {HiSpatialCluster: A novel high‐performance software tool for clustering massive spatial points},
author = {Chen, Yiran and Huang, Zhou and Pei, Tao and Liu, Yu},
abstractNote = {Abstract In the era of big data, spatial clustering is a very important means for geo‐data analysis. When clustering big geo‐data such as social media check‐in data, geotagged photos, and taxi trajectory points, traditional spatial clustering algorithms are facing more challenges. On the one hand, existing spatial clustering tools cannot support the clustering of massive point sets; on the other hand, there is no perfect solution for self‐adaptive spatial clustering. In order to achieve clustering of millions or even billions of points adaptively, a new spatial clustering tool—HiSpatialCluster—was proposed, in which the CFSFDP (clustering by fast search and finding density peaks) idea to find cluster centers and the DBSCAN (density‐based spatial clustering of applications with noise) idea of density‐connect filtering for classification are introduced. The tool’s source codes and other resources have been released on Github, and experimental evaluation was performed through clustering massive taxi trajectory points and Flickr geotagged photos in Beijing, China. The spatial clustering results were compared with those through K‐means and DBSCAN as well. As a spatial clustering tool, HiSpatialCluster is expected to play a fundamental role in big geo‐data research. First, this tool enables clustering adaptively on massive point datasets with uneven spatial density distribution. Second, the density‐connect filter method is applied to generate homogeneous analysis units from geotagged data. Third, the tool is accelerated by both parallel CPU and GPU computing so that millions or even billions of points can be clustered efficiently.},
doi = {10.1111/tgis.12463},
journal = {Transactions in GIS},
number = 5,
volume = 22,
place = {Country unknown/Code not available},
year = {Wed Aug 22 00:00:00 EDT 2018},
month = {Wed Aug 22 00:00:00 EDT 2018}
}
https://doi.org/10.1111/tgis.12463
Works referenced in this record:
A new approach to the nearest‐neighbour method to discover cluster features in overlaid spatial point processes
journal, February 2006
- Pei, Tao; Zhu, A‐Xing; Zhou, Chenghu
- International Journal of Geographical Information Science, Vol. 20, Issue 2
ST-DBSCAN: An algorithm for clustering spatial–temporal data
journal, January 2007
- Birant, Derya; Kut, Alp
- Data & Knowledge Engineering, Vol. 60, Issue 1
Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data
conference, December 2013
- Ertöz, Levent; Steinbach, Michael; Kumar, Vipin
- Proceedings of the 2003 SIAM International Conference on Data Mining
Detecting tourism destinations using scalable geospatial analysis based on cloud computing platform
journal, November 2015
- Zhou, Xiaolu; Xu, Chen; Kimmons, Brandon
- Computers, Environment and Urban Systems, Vol. 54
CURE: an efficient clustering algorithm for large databases
journal, June 1998
- Guha, Sudipto; Rastogi, Rajeev; Shim, Kyuseok
- ACM SIGMOD Record, Vol. 27, Issue 2
Multi-scale decomposition of point process data
journal, August 2012
- Pei, Tao; Gao, Jianhuan; Ma, Ting
- GeoInformatica, Vol. 16, Issue 4
ACOMCD: A multiple cluster detection algorithm based on the spatial scan statistic and ant colony optimization
journal, February 2012
- Wan, You; Pei, Tao; Zhou, Chenghu
- Computational Statistics & Data Analysis, Vol. 56, Issue 2
Argument free clustering for large spatial point-data sets via boundary extraction from Delaunay Diagram
journal, July 2002
- Estivill-Castro, V.; Lee, I.
- Computers, Environment and Urban Systems, Vol. 26, Issue 4
Trajectory clustering: a partition-and-group framework
conference, January 2007
- Lee, Jae-Gil; Han, Jiawei; Whang, Kyu-Young
- Proceedings of the 2007 ACM SIGMOD international conference on Management of data - SIGMOD '07
OPTICS: ordering points to identify the clustering structure
journal, June 1999
- Ankerst, Mihael; Breunig, Markus M.; Kriegel, Hans-Peter
- ACM SIGMOD Record, Vol. 28, Issue 2
Exploration of geo-tagged photos through data mining approaches
journal, February 2014
- Lee, Ickjai; Cai, Guochen; Lee, Kyungmi
- Expert Systems with Applications, Vol. 41, Issue 2
Identifying points of interest by self-tuning clustering
conference, January 2011
- Yang, Yiyang; Gong, Zhiguo; U., Leong Hou
- Proceedings of the 34th international ACM SIGIR conference on Research and development in Information - SIGIR '11
P-DBSCAN: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos
conference, January 2010
- Kisilevich, Slava; Mansmann, Florian; Keim, Daniel
- Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application - COM.Geo '10
Clustering by fast search and find of density peaks
journal, June 2014
- Rodriguez, A.; Laio, A.
- Science, Vol. 344, Issue 6191
Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging
journal, February 2005
- Cheng-Ru Lin,
- IEEE Transactions on Knowledge and Data Engineering, Vol. 17, Issue 2
Discovering regions of different functions in a city using human mobility and POIs
conference, January 2012
- Yuan, Jing; Zheng, Yu; Xie, Xing
- Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '12
Exploring the travel behaviors of inbound tourists to Hong Kong using geotagged photos
journal, February 2015
- Vu, Huy Quan; Li, Gang; Law, Rob
- Tourism Management, Vol. 46
Detecting feature from spatial point processes using Collective Nearest Neighbor
journal, November 2009
- Pei, Tao; Zhu, A-Xing; Zhou, Chenghu
- Computers, Environment and Urban Systems, Vol. 33, Issue 6
CLARANS: a method for clustering objects for spatial data mining
journal, September 2002
- Ng, R. T.
- IEEE Transactions on Knowledge and Data Engineering, Vol. 14, Issue 5
An adaptive spatial clustering algorithm based on delaunay triangulation
journal, July 2011
- Deng, Min; Liu, Qiliang; Cheng, Tao
- Computers, Environment and Urban Systems, Vol. 35, Issue 4
Discovering Spatial Patterns in Origin-Destination Mobility Data: Discovering Spatial Patterns in Origin-Destination Mobility Data
journal, May 2012
- Guo, Diansheng; Zhu, Xi; Jin, Hai
- Transactions in GIS, Vol. 16, Issue 3
Mining city landmarks from blogs by graph modeling
conference, January 2009
- Ji, Rongrong; Xie, Xing; Yao, Hongxun
- Proceedings of the seventeen ACM international conference on Multimedia - MM '09
Mining Points-of-Interest Association Rules from Geo-tagged Photos
conference, January 2013
- Lee, Ickjai; Cai, Guochen; Lee, Kyungmi
- 2013 46th Hawaii International Conference on System Sciences (HICSS)