DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements

Abstract

We report that local place names are frequently used by residents living in a geographic region. Such place names may not be recorded in existing gazetteers, due to their vernacular nature, relative insignificance to a gazetteer covering a large area (e.g. the entire world), recent establishment (e.g. the name of a newly-opened shopping center) or other reasons. While not always recorded, local place names play important roles in many applications, from supporting public participation in urban planning to locating victims in disaster response. In this paper, we propose a computational framework for harvesting local place names from geotagged housing advertisements. We make use of those advertisements posted on local-oriented websites, such as Craigslist, where local place names are often mentioned. The proposed framework consists of two stages: natural language processing (NLP) and geospatial clustering. The NLP stage examines the textual content of housing advertisements and extracts place name candidates. The geospatial stage focuses on the coordinates associated with the extracted place name candidates and performs multiscale geospatial clustering to filter out the non-place names. We evaluate our framework by comparing its performance with those of six baselines. Finally, we also compare our result with four existing gazetteers to demonstrate themore » not-yet-recorded local place names discovered by our framework.« less

Authors:
 [1]; ORCiD logo [2];  [3]
  1. Univ. of Tennessee, Knoxville, TN (United States). Department of Geography
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Geographic Information Science and Technology Group
  3. Univ. of Maryland, College Park, MD (United States). Department of Geographical Sciences
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1435186
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
International Journal of Geographical Information Science
Additional Journal Information:
Journal Volume: 33; Journal Issue: 4; Journal ID: ISSN 1365-8816
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Local place name; gazetteer; natural language processing; named entity recognition; geospatial clustering; geospatial semantics

Citation Formats

Hu, Yingjie, Mao, Huina, and Mckenzie, Grant. A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements. United States: N. p., 2018. Web. doi:10.1080/13658816.2018.1458986.
Hu, Yingjie, Mao, Huina, & Mckenzie, Grant. A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements. United States. https://doi.org/10.1080/13658816.2018.1458986
Hu, Yingjie, Mao, Huina, and Mckenzie, Grant. Fri . "A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements". United States. https://doi.org/10.1080/13658816.2018.1458986. https://www.osti.gov/servlets/purl/1435186.
@article{osti_1435186,
title = {A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements},
author = {Hu, Yingjie and Mao, Huina and Mckenzie, Grant},
abstractNote = {We report that local place names are frequently used by residents living in a geographic region. Such place names may not be recorded in existing gazetteers, due to their vernacular nature, relative insignificance to a gazetteer covering a large area (e.g. the entire world), recent establishment (e.g. the name of a newly-opened shopping center) or other reasons. While not always recorded, local place names play important roles in many applications, from supporting public participation in urban planning to locating victims in disaster response. In this paper, we propose a computational framework for harvesting local place names from geotagged housing advertisements. We make use of those advertisements posted on local-oriented websites, such as Craigslist, where local place names are often mentioned. The proposed framework consists of two stages: natural language processing (NLP) and geospatial clustering. The NLP stage examines the textual content of housing advertisements and extracts place name candidates. The geospatial stage focuses on the coordinates associated with the extracted place name candidates and performs multiscale geospatial clustering to filter out the non-place names. We evaluate our framework by comparing its performance with those of six baselines. Finally, we also compare our result with four existing gazetteers to demonstrate the not-yet-recorded local place names discovered by our framework.},
doi = {10.1080/13658816.2018.1458986},
journal = {International Journal of Geographical Information Science},
number = 4,
volume = 33,
place = {United States},
year = {2018},
month = {4}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 5 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Locating place names from place descriptions
journal, December 2013

  • Vasardani, Maria; Winter, Stephan; Richter, Kai-Florian
  • International Journal of Geographical Information Science, Vol. 27, Issue 12
  • DOI: 10.1080/13658816.2013.785550

The role of ontology in improving gazetteer interaction
journal, October 2008

  • Janowicz, K.; Keßler, C.
  • International Journal of Geographical Information Science, Vol. 22, Issue 10
  • DOI: 10.1080/13658810701851461

Automated Footprint Generation from Geotags with Kernel Density Estimation and Support Vector Machines
journal, August 2009


Spatialization of user-generated content to uncover the multirelational world city network
journal, September 2015

  • Salvini, Marco M.; Fabrikant, Sara I.
  • Environment and Planning B: Planning and Design, Vol. 43, Issue 1
  • DOI: 10.1177/0265813515603868

On the identification of the convex hull of a finite set of points in the plane
journal, March 1973


Where's Downtown?: Behavioral Methods for Determining Referents of Vague Spatial Queries
journal, September 2003

  • Montello, Daniel R.; Goodchild, Michael F.; Gottsegen, Jonathon
  • Spatial Cognition & Computation, Vol. 3, Issue 2-3
  • DOI: 10.1080/13875868.2003.9683761

Evaluating Community Engagement through Argumentation Maps—A Public Participation GIS Case Study
journal, January 2009

  • Rinner, Claus; Bird, Michelle
  • Environment and Planning B: Planning and Design, Vol. 36, Issue 4
  • DOI: 10.1068/b34084

Using machine learning methods for disambiguating place references in textual documents
journal, May 2014


Geospatial footprint library of geoparsed text from geocrowdsourcing
journal, July 2016


Using co‐occurrence models for placename disambiguation
journal, March 2008

  • Overell, Simon; Rüger, Stefan
  • International Journal of Geographical Information Science, Vol. 22, Issue 3
  • DOI: 10.1080/13658810701626236

Supporting Accessibility for Blind and Vision-impaired People With a Localized Gazetteer and Open Source Geotechnology: VGI and Geotechnology
journal, April 2012


The viterbi algorithm
journal, January 1973


What's in a Name? Place Branding and Toponymic Commodification
journal, January 2014

  • Medway, Dominic; Warnaby, Gary
  • Environment and Planning A: Economy and Space, Vol. 46, Issue 1
  • DOI: 10.1068/a45571

An empirical study of the effects of NLP components on Geographic IR performance
journal, March 2008

  • Stokes, Nicola; Li, Yi; Moffat, Alistair
  • International Journal of Geographical Information Science, Vol. 22, Issue 3
  • DOI: 10.1080/13658810701626210

Modelling vague places with knowledge from the Web
journal, October 2008

  • Jones, C. B.; Purves, R. S.; Clough, P. D.
  • International Journal of Geographical Information Science, Vol. 22, Issue 10
  • DOI: 10.1080/13658810701850547

A conceptual density‐based approach for the disambiguation of toponyms
journal, March 2008

  • Buscaldi, Davide; Rosso, Paulo
  • International Journal of Geographical Information Science, Vol. 22, Issue 3
  • DOI: 10.1080/13658810701626251

POI Pulse: A Multi-granular, Semantic Signature–Based Information Observatory for the Interactive Visualization of Big Geosocial Data
journal, June 2015

  • McKenzie, Grant; Janowicz, Krzysztof; Gao, Song
  • Cartographica: The International Journal for Geographic Information and Geovisualization, Vol. 50, Issue 2
  • DOI: 10.3138/cart.50.2.2662

Efficient generation of simple polygons for characterizing the shape of a set of points in the plane
journal, October 2008


Digital Footprinting: Uncovering Tourists with User-Generated Content
journal, October 2008

  • Girardin, Fabien; Calabrese, Francesco; Fiore, Filippo Dal
  • IEEE Pervasive Computing, Vol. 7, Issue 4
  • DOI: 10.1109/MPRV.2008.71

Geoparsing, GIS, and Textual Analysis: Current Developments in Spatial Humanities Research
journal, March 2015

  • Gregory, Ian; Donaldson, Christopher; Murrieta-Flores, Patricia
  • International Journal of Humanities and Arts Computing, Vol. 9, Issue 1
  • DOI: 10.3366/ijhac.2015.0135

A multistage collaborative 3D GIS to support public participation
journal, December 2013


Analyzing Relatedness by Toponym Co-Occurrences on Web Pages: Analyzing Relatedness by Toponym Co-Occurrences on Web Pages
journal, March 2013

  • Liu, Yu; Wang, Fahui; Kang, Chaogui
  • Transactions in GIS, Vol. 18, Issue 1
  • DOI: 10.1111/tgis.12023

Rebuilding the Great Britain Historical GIS, Part 3:Integrating Qualitative Content for a Sense of Place
journal, January 2014

  • Southall, Humphrey
  • Historical Methods: A Journal of Quantitative and Interdisciplinary History, Vol. 47, Issue 1
  • DOI: 10.1080/01615440.2013.847774

A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation
journal, July 1991


GeoCorpora: building a corpus to test and train microblog geoparsers
journal, September 2017

  • Wallgrün, Jan Oliver; Karimzadeh, Morteza; MacEachren, Alan M.
  • International Journal of Geographical Information Science, Vol. 32, Issue 1
  • DOI: 10.1080/13658816.2017.1368523

Geographical information retrieval
journal, March 2008

  • Jones, Christopher B.; Purves, Ross S.
  • International Journal of Geographical Information Science, Vol. 22, Issue 3
  • DOI: 10.1080/13658810701626343

Public Participation GIS and Participatory GIS in the Era of GeoWeb
journal, October 2016


Introduction to digital gazetteer research
journal, October 2008

  • Goodchild, M. F.; Hill, L. L.
  • International Journal of Geographical Information Science, Vol. 22, Issue 10
  • DOI: 10.1080/13658810701850497

Pushed off the map: Toponymy and the politics of place in New York City
journal, February 2017


Improving efficiency and accuracy in multilingual entity extraction
conference, January 2013

  • Daiber, Joachim; Jakob, Max; Hokamp, Chris
  • Proceedings of the 9th International Conference on Semantic Systems - I-SEMANTICS '13
  • DOI: 10.1145/2506182.2506198

GeoTxt: a web API to leverage place references in text
conference, January 2013

  • Karimzadeh, Morteza; Huang, Wenyi; Banerjee, Siddhartha
  • Proceedings of the 7th Workshop on Geographic Information Retrieval - GIR '13
  • DOI: 10.1145/2533888.2533942

DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia
journal, January 2015

  • Lehmann, Jens; Isele, Robert; Jakob, Max
  • Semantic Web, Vol. 6, Issue 2
  • DOI: 10.3233/SW-140134

Geotagging with local lexicons to build indexes for textually-specified spatial data
conference, March 2010

  • Lieberman, Michael D.; Samet, Hanan; Sankaranarayanan, Jagan
  • 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)
  • DOI: 10.1109/ICDE.2010.5447903

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition
conference, January 2003

  • Tjong Kim Sang, Erik F.; De Meulder, Fien
  • Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 -
  • DOI: 10.3115/1119176.1119195

GeoCorpora: building a corpus to test and train microblog geoparsers
text, January 2017


Spatialization of user-generated content to uncover the multirelational world city network
text, January 2016


GeoCorpora: building a corpus to test and train microblog geoparsers
text, January 2017


The Viterbi algorithm
conference, January 2006

  • Pulford, G.
  • IEE Seminar on Target Tracking: Algorithms and Applications
  • DOI: 10.1049/ic:20060556

Where's Downtown?: Behavioral Methods for Determining Referents of Vague Spatial Queries
journal, September 2003

  • Montello, Daniel; Goodchild, Michael; Gottsegen, Jonathon
  • Spatial Cognition & Computation, Vol. 3, Issue 2
  • DOI: 10.1207/s15427633scc032&3_06

Modelling vague places with knowledge from the Web
text, January 2008

  • Jones, Christopher B.; Purves, Ross S.; Clough, Paul D.
  • Taylor & Francis
  • DOI: 10.5167/uzh-7125

Geographical information retrieval
text, January 2008


Works referencing / citing this record:

Representation and analytical models for location-based big data
journal, January 2019

  • Yao, Xiaobai A.; Huang, Haosheng; Jiang, Bin
  • International Journal of Geographical Information Science, Vol. 33, Issue 4
  • DOI: 10.1080/13658816.2018.1562068

Placial analysis of events: a case study on criminological places
journal, January 2019


A name‐led approach to profile urban places based on geotagged Twitter data
journal, December 2019

  • Lai, Juntao; Lansley, Guy; Haworth, James
  • Transactions in GIS, Vol. 24, Issue 4
  • DOI: 10.1111/tgis.12599

Identifying Urban Neighborhood Names through User-Contributed Online Property Listings
journal, September 2018

  • McKenzie, Grant; Liu, Zheng; Hu, Yingjie
  • ISPRS International Journal of Geo-Information, Vol. 7, Issue 10
  • DOI: 10.3390/ijgi7100388

Modeling Housing Rent in the Atlanta Metropolitan Area Using Textual Information and Deep Learning
journal, August 2019

  • Zhou, Xiaolu; Tong, Weitian; Li, Dongying
  • ISPRS International Journal of Geo-Information, Vol. 8, Issue 8
  • DOI: 10.3390/ijgi8080349