DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements

Journal Article · · International Journal of Geographical Information Science
 [1]; ORCiD logo [2];  [3]
  1. Univ. of Tennessee, Knoxville, TN (United States). Department of Geography
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Geographic Information Science and Technology Group
  3. Univ. of Maryland, College Park, MD (United States). Department of Geographical Sciences

We report that local place names are frequently used by residents living in a geographic region. Such place names may not be recorded in existing gazetteers, due to their vernacular nature, relative insignificance to a gazetteer covering a large area (e.g. the entire world), recent establishment (e.g. the name of a newly-opened shopping center) or other reasons. While not always recorded, local place names play important roles in many applications, from supporting public participation in urban planning to locating victims in disaster response. In this paper, we propose a computational framework for harvesting local place names from geotagged housing advertisements. We make use of those advertisements posted on local-oriented websites, such as Craigslist, where local place names are often mentioned. The proposed framework consists of two stages: natural language processing (NLP) and geospatial clustering. The NLP stage examines the textual content of housing advertisements and extracts place name candidates. The geospatial stage focuses on the coordinates associated with the extracted place name candidates and performs multiscale geospatial clustering to filter out the non-place names. We evaluate our framework by comparing its performance with those of six baselines. Finally, we also compare our result with four existing gazetteers to demonstrate the not-yet-recorded local place names discovered by our framework.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1435186
Journal Information:
International Journal of Geographical Information Science, Vol. 33, Issue 4; ISSN 1365-8816
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 35 works
Citation information provided by
Web of Science

References (72)

GeoCorpora: building a corpus to test and train microblog geoparsers text January 2017
Automated Footprint Generation from Geotags with Kernel Density Estimation and Support Vector Machines journal August 2009
Spatialization of user-generated content to uncover the multirelational world city network journal September 2015
Spatial signatures for geographic feature types: examining gazetteer ontologies using spatial statistics: Spatial Signatures for Geographic Feature Types: Examining Gazetteer Ontologies using Spatial Statistics journal June 2016
GeoSR: Geographically Explore Semantic Relations in World Knowledge book January 2008
Using machine learning methods for disambiguating place references in textual documents journal May 2014
Spatialization of user-generated content to uncover the multirelational world city network text January 2016
Geospatial footprint library of geoparsed text from geocrowdsourcing journal July 2016
Using co‐occurrence models for placename disambiguation journal March 2008
The viterbi algorithm journal January 1973
Geo-parsing Messages from Microtext: Geo-parsing Messages from Microtext journal November 2011
GeoSR: Geographically Explore Semantic Relations in World Knowledge book January 2008
Rebuilding the Great Britain Historical GIS, Part 3:Integrating Qualitative Content for a Sense of Place journal January 2014
Modelling vague places with knowledge from the Web journal October 2008
Geotagging with local lexicons to build indexes for textually-specified spatial data conference March 2010
Efficient generation of simple polygons for characterizing the shape of a set of points in the plane journal October 2008
GeoTxt: a web API to leverage place references in text conference January 2013
Digital Footprinting: Uncovering Tourists with User-Generated Content journal October 2008
Geoparsing, GIS, and Textual Analysis: Current Developments in Spatial Humanities Research journal March 2015
A multistage collaborative 3D GIS to support public participation journal December 2013
Geographical Information Retrieval book January 2009
Analyzing Relatedness by Toponym Co-Occurrences on Web Pages: Analyzing Relatedness by Toponym Co-Occurrences on Web Pages journal March 2013
A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation journal July 1991
GeoCorpora: building a corpus to test and train microblog geoparsers text January 2017
GeoCorpora: building a corpus to test and train microblog geoparsers journal September 2017
Towards automatic extraction of event and place semantics from flickr tags
  • Rattenbury, Tye; Good, Nathaniel; Naaman, Mor
  • Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07 https://doi.org/10.1145/1277741.1277762
conference January 2007
The Viterbi algorithm conference January 2006
Improving efficiency and accuracy in multilingual entity extraction conference January 2013
Acquisition of Vernacular Place Names from Web Sources book January 2009
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition conference January 2003
Geographical information retrieval text January 2008
Pushed off the map: Toponymy and the politics of place in New York City journal February 2017
Engaging the wisdom of crowds and public judgement for land use planning using public participation geographic information systems journal March 2014
Locating place names from place descriptions journal December 2013
Locating place names from place descriptions journal December 2013
The role of ontology in improving gazetteer interaction journal October 2008
On the identification of the convex hull of a finite set of points in the plane journal March 1973
Where's Downtown?: Behavioral Methods for Determining Referents of Vague Spatial Queries journal September 2003
Geographical Information Retrieval book January 2018
Evaluating Community Engagement through Argumentation Maps—A Public Participation GIS Case Study journal January 2009
Supporting Accessibility for Blind and Vision-impaired People With a Localized Gazetteer and Open Source Geotechnology: VGI and Geotechnology journal April 2012
Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles journal February 2015
What's in a Name? Place Branding and Toponymic Commodification journal January 2014
An empirical study of the effects of NLP components on Geographic IR performance journal March 2008
Modelling vague places with knowledge from the Web text January 2008
A conceptual density‐based approach for the disambiguation of toponyms journal March 2008
POI Pulse: A Multi-granular, Semantic Signature–Based Information Observatory for the Interactive Visualization of Big Geosocial Data
  • McKenzie, Grant; Janowicz, Krzysztof; Gao, Song
  • Cartographica: The International Journal for Geographic Information and Geovisualization, Vol. 50, Issue 2 https://doi.org/10.3138/cart.50.2.2662
journal June 2015
DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia journal January 2015
Automatic gazetteer enrichment with user-geocoded data
  • Gelernter, Judith; Ganesh, Gautam; Krishnakumar, Hamsini
  • Proceedings of the Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information https://doi.org/10.1145/2534732.2534736
conference November 2013
GeoTxt: a web API to leverage place references in text conference January 2013
Geographic Information Retrieval book January 2008
What's in a Name? Place Branding and Toponymic Commodification journal January 2014
Rebuilding the Great Britain Historical GIS, Part 3:Integrating Qualitative Content for a Sense of Place journal January 2014
A Toponym Resolution Service Following the OGC WPS Standard book January 2008
Efficient generation of simple polygons for characterizing the shape of a set of points in the plane journal October 2008
Geographical information retrieval journal March 2008
Where's Downtown?: Behavioral Methods for Determining Referents of Vague Spatial Queries journal September 2003
Automated Footprint Generation from Geotags with Kernel Density Estimation and Support Vector Machines journal August 2009
Public Participation GIS and Participatory GIS in the Era of GeoWeb journal October 2016
Introduction to digital gazetteer research journal October 2008
Web-a-where
  • Amitay, Einat; Har'El, Nadav; Sivan, Ron
  • Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval https://doi.org/10.1145/1008992.1009040
conference July 2004
Incorporating non-local information into information extraction systems by Gibbs sampling conference January 2005
Automatic gazetteer enrichment with user-geocoded data
  • Gelernter, Judith; Ganesh, Gautam; Krishnakumar, Hamsini
  • Proceedings of the Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information https://doi.org/10.1145/2534732.2534736
conference November 2013
Learning boundaries of vague places from noisy annotations conference November 2011
Bottom-Up Gazetteers: Learning from the Implicit Semantics of Geotags book January 2009
An agenda for the next generation gazetteer
  • Keßler, Carsten; Janowicz, Krzysztof; Bishr, Mohamed
  • Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems https://doi.org/10.1145/1653771.1653787
conference November 2009
Constructing places from spatial footprints conference November 2012
Multifaceted toponym recognition for streaming news conference July 2011
Location extraction from disaster-related microblogs conference May 2013
Geospatial mapping and navigation of the web conference April 2001
Towards automatic extraction of event and place semantics from flickr tags
  • Rattenbury, Tye; Good, Nathaniel; Naaman, Mor
  • Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07 https://doi.org/10.1145/1277741.1277762
conference January 2007
A web platform for the evaluation of vernacular place names in automatically constructed gazetteers conference February 2010

Cited By (5)

A name‐led approach to profile urban places based on geotagged Twitter data journal December 2019
Placial analysis of events: a case study on criminological places journal January 2019
Representation and analytical models for location-based big data journal January 2019
Identifying Urban Neighborhood Names through User-Contributed Online Property Listings journal September 2018
Modeling Housing Rent in the Atlanta Metropolitan Area Using Textual Information and Deep Learning journal August 2019