HiSpatialCluster: A novel high‐performance software tool for clustering massive spatial points
- Institute of Remote Sensing and Geographical Information Systems Peking University Beijing China, Beijing Key Lab of Spatial Information Integration &, Its Applications Peking University Beijing China
- State Key Laboratory of Resources and Environmental Information System Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences Beijing China
Abstract In the era of big data, spatial clustering is a very important means for geo‐data analysis. When clustering big geo‐data such as social media check‐in data, geotagged photos, and taxi trajectory points, traditional spatial clustering algorithms are facing more challenges. On the one hand, existing spatial clustering tools cannot support the clustering of massive point sets; on the other hand, there is no perfect solution for self‐adaptive spatial clustering. In order to achieve clustering of millions or even billions of points adaptively, a new spatial clustering tool—HiSpatialCluster—was proposed, in which the CFSFDP (clustering by fast search and finding density peaks) idea to find cluster centers and the DBSCAN (density‐based spatial clustering of applications with noise) idea of density‐connect filtering for classification are introduced. The tool’s source codes and other resources have been released on Github, and experimental evaluation was performed through clustering massive taxi trajectory points and Flickr geotagged photos in Beijing, China. The spatial clustering results were compared with those through K‐means and DBSCAN as well. As a spatial clustering tool, HiSpatialCluster is expected to play a fundamental role in big geo‐data research. First, this tool enables clustering adaptively on massive point datasets with uneven spatial density distribution. Second, the density‐connect filter method is applied to generate homogeneous analysis units from geotagged data. Third, the tool is accelerated by both parallel CPU and GPU computing so that millions or even billions of points can be clustered efficiently.
- Sponsoring Organization:
- USDOE
- OSTI ID:
- 1466059
- Journal Information:
- Transactions in GIS, Journal Name: Transactions in GIS Vol. 22 Journal Issue: 5; ISSN 1361-1682
- Publisher:
- Wiley-BlackwellCopyright Statement
- Country of Publication:
- Country unknown/Code not available
- Language:
- English
Similar Records
Examining Rail Transportation Route of Crude Oil in the United States Using Crowdsourced Social Media Data
Mr. Scan: extreme scale density-based clustering using a tree-based network of GPGPU nodes, In: SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis