A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements
Abstract
We report that local place names are frequently used by residents living in a geographic region. Such place names may not be recorded in existing gazetteers, due to their vernacular nature, relative insignificance to a gazetteer covering a large area (e.g. the entire world), recent establishment (e.g. the name of a newly-opened shopping center) or other reasons. While not always recorded, local place names play important roles in many applications, from supporting public participation in urban planning to locating victims in disaster response. In this paper, we propose a computational framework for harvesting local place names from geotagged housing advertisements. We make use of those advertisements posted on local-oriented websites, such as Craigslist, where local place names are often mentioned. The proposed framework consists of two stages: natural language processing (NLP) and geospatial clustering. The NLP stage examines the textual content of housing advertisements and extracts place name candidates. The geospatial stage focuses on the coordinates associated with the extracted place name candidates and performs multiscale geospatial clustering to filter out the non-place names. We evaluate our framework by comparing its performance with those of six baselines. Finally, we also compare our result with four existing gazetteers to demonstrate themore »
- Authors:
-
- Univ. of Tennessee, Knoxville, TN (United States). Department of Geography
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Geographic Information Science and Technology Group
- Univ. of Maryland, College Park, MD (United States). Department of Geographical Sciences
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1435186
- Grant/Contract Number:
- AC05-00OR22725
- Resource Type:
- Accepted Manuscript
- Journal Name:
- International Journal of Geographical Information Science
- Additional Journal Information:
- Journal Volume: 33; Journal Issue: 4; Journal ID: ISSN 1365-8816
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Local place name; gazetteer; natural language processing; named entity recognition; geospatial clustering; geospatial semantics
Citation Formats
Hu, Yingjie, Mao, Huina, and Mckenzie, Grant. A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements. United States: N. p., 2018.
Web. doi:10.1080/13658816.2018.1458986.
Hu, Yingjie, Mao, Huina, & Mckenzie, Grant. A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements. United States. https://doi.org/10.1080/13658816.2018.1458986
Hu, Yingjie, Mao, Huina, and Mckenzie, Grant. Fri .
"A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements". United States. https://doi.org/10.1080/13658816.2018.1458986. https://www.osti.gov/servlets/purl/1435186.
@article{osti_1435186,
title = {A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements},
author = {Hu, Yingjie and Mao, Huina and Mckenzie, Grant},
abstractNote = {We report that local place names are frequently used by residents living in a geographic region. Such place names may not be recorded in existing gazetteers, due to their vernacular nature, relative insignificance to a gazetteer covering a large area (e.g. the entire world), recent establishment (e.g. the name of a newly-opened shopping center) or other reasons. While not always recorded, local place names play important roles in many applications, from supporting public participation in urban planning to locating victims in disaster response. In this paper, we propose a computational framework for harvesting local place names from geotagged housing advertisements. We make use of those advertisements posted on local-oriented websites, such as Craigslist, where local place names are often mentioned. The proposed framework consists of two stages: natural language processing (NLP) and geospatial clustering. The NLP stage examines the textual content of housing advertisements and extracts place name candidates. The geospatial stage focuses on the coordinates associated with the extracted place name candidates and performs multiscale geospatial clustering to filter out the non-place names. We evaluate our framework by comparing its performance with those of six baselines. Finally, we also compare our result with four existing gazetteers to demonstrate the not-yet-recorded local place names discovered by our framework.},
doi = {10.1080/13658816.2018.1458986},
journal = {International Journal of Geographical Information Science},
number = 4,
volume = 33,
place = {United States},
year = {2018},
month = {4}
}
Web of Science
Works referenced in this record:
Engaging the wisdom of crowds and public judgement for land use planning using public participation geographic information systems
journal, March 2014
- Brown, Greg
- Australian Planner, Vol. 52, Issue 3
Locating place names from place descriptions
journal, December 2013
- Vasardani, Maria; Winter, Stephan; Richter, Kai-Florian
- International Journal of Geographical Information Science, Vol. 27, Issue 12
The role of ontology in improving gazetteer interaction
journal, October 2008
- Janowicz, K.; Keßler, C.
- International Journal of Geographical Information Science, Vol. 22, Issue 10
Automated Footprint Generation from Geotags with Kernel Density Estimation and Support Vector Machines
journal, August 2009
- Grothe, Christian; Schaab, Jochen
- Spatial Cognition & Computation, Vol. 9, Issue 3
Spatialization of user-generated content to uncover the multirelational world city network
journal, September 2015
- Salvini, Marco M.; Fabrikant, Sara I.
- Environment and Planning B: Planning and Design, Vol. 43, Issue 1
On the identification of the convex hull of a finite set of points in the plane
journal, March 1973
- Jarvis, R. A.
- Information Processing Letters, Vol. 2, Issue 1
Where's Downtown?: Behavioral Methods for Determining Referents of Vague Spatial Queries
journal, September 2003
- Montello, Daniel R.; Goodchild, Michael F.; Gottsegen, Jonathon
- Spatial Cognition & Computation, Vol. 3, Issue 2-3
Evaluating Community Engagement through Argumentation Maps—A Public Participation GIS Case Study
journal, January 2009
- Rinner, Claus; Bird, Michelle
- Environment and Planning B: Planning and Design, Vol. 36, Issue 4
Spatial signatures for geographic feature types: examining gazetteer ontologies using spatial statistics: Spatial Signatures for Geographic Feature Types: Examining Gazetteer Ontologies using Spatial Statistics
journal, June 2016
- Zhu, Rui; Hu, Yingjie; Janowicz, Krzysztof
- Transactions in GIS, Vol. 20, Issue 3
Using machine learning methods for disambiguating place references in textual documents
journal, May 2014
- Santos, João; Anastácio, Ivo; Martins, Bruno
- GeoJournal, Vol. 80, Issue 3
Geospatial footprint library of geoparsed text from geocrowdsourcing
journal, July 2016
- Aburizaiza, Ahmad O.; Rice, Matthew T.
- Spatial Information Research, Vol. 24, Issue 4
Using co‐occurrence models for placename disambiguation
journal, March 2008
- Overell, Simon; Rüger, Stefan
- International Journal of Geographical Information Science, Vol. 22, Issue 3
Supporting Accessibility for Blind and Vision-impaired People With a Localized Gazetteer and Open Source Geotechnology: VGI and Geotechnology
journal, April 2012
- Rice, Matthew T.; Aburizaiza, Ahmad O.; Jacobson, R. Daniel
- Transactions in GIS, Vol. 16, Issue 2
What's in a Name? Place Branding and Toponymic Commodification
journal, January 2014
- Medway, Dominic; Warnaby, Gary
- Environment and Planning A: Economy and Space, Vol. 46, Issue 1
An empirical study of the effects of NLP components on Geographic IR performance
journal, March 2008
- Stokes, Nicola; Li, Yi; Moffat, Alistair
- International Journal of Geographical Information Science, Vol. 22, Issue 3
Modelling vague places with knowledge from the Web
journal, October 2008
- Jones, C. B.; Purves, R. S.; Clough, P. D.
- International Journal of Geographical Information Science, Vol. 22, Issue 10
A conceptual density‐based approach for the disambiguation of toponyms
journal, March 2008
- Buscaldi, Davide; Rosso, Paulo
- International Journal of Geographical Information Science, Vol. 22, Issue 3
POI Pulse: A Multi-granular, Semantic Signature–Based Information Observatory for the Interactive Visualization of Big Geosocial Data
journal, June 2015
- McKenzie, Grant; Janowicz, Krzysztof; Gao, Song
- Cartographica: The International Journal for Geographic Information and Geovisualization, Vol. 50, Issue 2
Efficient generation of simple polygons for characterizing the shape of a set of points in the plane
journal, October 2008
- Duckham, Matt; Kulik, Lars; Worboys, Mike
- Pattern Recognition, Vol. 41, Issue 10
Digital Footprinting: Uncovering Tourists with User-Generated Content
journal, October 2008
- Girardin, Fabien; Calabrese, Francesco; Fiore, Filippo Dal
- IEEE Pervasive Computing, Vol. 7, Issue 4
Geoparsing, GIS, and Textual Analysis: Current Developments in Spatial Humanities Research
journal, March 2015
- Gregory, Ian; Donaldson, Christopher; Murrieta-Flores, Patricia
- International Journal of Humanities and Arts Computing, Vol. 9, Issue 1
A multistage collaborative 3D GIS to support public participation
journal, December 2013
- Hu, Yingjie; Lv, Zhenhua; Wu, Jianping
- International Journal of Digital Earth, Vol. 8, Issue 3
Analyzing Relatedness by Toponym Co-Occurrences on Web Pages: Analyzing Relatedness by Toponym Co-Occurrences on Web Pages
journal, March 2013
- Liu, Yu; Wang, Fahui; Kang, Chaogui
- Transactions in GIS, Vol. 18, Issue 1
Rebuilding the Great Britain Historical GIS, Part 3:Integrating Qualitative Content for a Sense of Place
journal, January 2014
- Southall, Humphrey
- Historical Methods: A Journal of Quantitative and Interdisciplinary History, Vol. 47, Issue 1
A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation
journal, July 1991
- Sheather, S. J.; Jones, M. C.
- Journal of the Royal Statistical Society: Series B (Methodological), Vol. 53, Issue 3
GeoCorpora: building a corpus to test and train microblog geoparsers
journal, September 2017
- Wallgrün, Jan Oliver; Karimzadeh, Morteza; MacEachren, Alan M.
- International Journal of Geographical Information Science, Vol. 32, Issue 1
Geographical information retrieval
journal, March 2008
- Jones, Christopher B.; Purves, Ross S.
- International Journal of Geographical Information Science, Vol. 22, Issue 3
Public Participation GIS and Participatory GIS in the Era of GeoWeb
journal, October 2016
- Kar, Bandana; Sieber, Renee; Haklay, Muki
- The Cartographic Journal, Vol. 53, Issue 4
Introduction to digital gazetteer research
journal, October 2008
- Goodchild, M. F.; Hill, L. L.
- International Journal of Geographical Information Science, Vol. 22, Issue 10
Pushed off the map: Toponymy and the politics of place in New York City
journal, February 2017
- Madden, David J.
- Urban Studies, Vol. 55, Issue 8
Improving efficiency and accuracy in multilingual entity extraction
conference, January 2013
- Daiber, Joachim; Jakob, Max; Hokamp, Chris
- Proceedings of the 9th International Conference on Semantic Systems - I-SEMANTICS '13
GeoTxt: a web API to leverage place references in text
conference, January 2013
- Karimzadeh, Morteza; Huang, Wenyi; Banerjee, Siddhartha
- Proceedings of the 7th Workshop on Geographic Information Retrieval - GIR '13
DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia
journal, January 2015
- Lehmann, Jens; Isele, Robert; Jakob, Max
- Semantic Web, Vol. 6, Issue 2
Geotagging with local lexicons to build indexes for textually-specified spatial data
conference, March 2010
- Lieberman, Michael D.; Samet, Hanan; Sankaranarayanan, Jagan
- 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition
conference, January 2003
- Tjong Kim Sang, Erik F.; De Meulder, Fien
- Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 -
GeoCorpora: building a corpus to test and train microblog geoparsers
text, January 2017
- Wallgrün, Jan Oliver; Karimzadeh, Morteza; MacEachren, Alan M.
- Taylor & Francis
Spatialization of user-generated content to uncover the multirelational world city network
text, January 2016
- Salvini, Marco M.; Fabrikant, Sara I.
- Sage Publications Ltd.
GeoCorpora: building a corpus to test and train microblog geoparsers
text, January 2017
- Wallgrün, Jan Oliver; Karimzadeh, Morteza; MacEachren, Alan M.
- Taylor & Francis
The Viterbi algorithm
conference, January 2006
- Pulford, G.
- IEE Seminar on Target Tracking: Algorithms and Applications
Where's Downtown?: Behavioral Methods for Determining Referents of Vague Spatial Queries
journal, September 2003
- Montello, Daniel; Goodchild, Michael; Gottsegen, Jonathon
- Spatial Cognition & Computation, Vol. 3, Issue 2
Modelling vague places with knowledge from the Web
text, January 2008
- Jones, Christopher B.; Purves, Ross S.; Clough, Paul D.
- Taylor & Francis
Geographical information retrieval
text, January 2008
- Jones, Christopher B.; Purves, Ross S.
- Taylor & Francis
Works referencing / citing this record:
Representation and analytical models for location-based big data
journal, January 2019
- Yao, Xiaobai A.; Huang, Haosheng; Jiang, Bin
- International Journal of Geographical Information Science, Vol. 33, Issue 4
Placial analysis of events: a case study on criminological places
journal, January 2019
- Cho, Sunghwan; Yuan, May
- Cartography and Geographic Information Science, Vol. 46, Issue 6
A name‐led approach to profile urban places based on geotagged Twitter data
journal, December 2019
- Lai, Juntao; Lansley, Guy; Haworth, James
- Transactions in GIS, Vol. 24, Issue 4
Identifying Urban Neighborhood Names through User-Contributed Online Property Listings
journal, September 2018
- McKenzie, Grant; Liu, Zheng; Hu, Yingjie
- ISPRS International Journal of Geo-Information, Vol. 7, Issue 10
Modeling Housing Rent in the Atlanta Metropolitan Area Using Textual Information and Deep Learning
journal, August 2019
- Zhou, Xiaolu; Tong, Weitian; Li, Dongying
- ISPRS International Journal of Geo-Information, Vol. 8, Issue 8