skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Hybrid Classification Scheme for Mining Multisource Geospatial Data

Abstract

Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and atmospheric conditions present at the time of data acquisition. A second problem with statistical classifiers is the requirement of large number of accurate training samples, which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, it is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately there is no convenient multivariate statistical model that can be employed for mulitsource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on realmore » datasets, and our new hybrid approach shows over 15% improvement in classification accuracy over conventional classification schemes.« less

Authors:
 [1];  [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
Work for Others (WFO)
OSTI Identifier:
978780
DOE Contract Number:
DE-AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: ICDM International Workshop on Spatial and Spatio-temporal Data Mining (SSTDM), Omaha, NE, USA, 20071028, 20071028
Country of Publication:
United States
Language:
English
Subject:
54 ENVIRONMENTAL SCIENCES; ACCURACY; ALGORITHMS; AVAILABILITY; CLASSIFICATION; DATA ACQUISITION; LEARNING; MINING; REMOTE SENSING; SOILS; SPECTRAL RESPONSE; STATISTICAL MODELS; TRAINING; MLC; EM; Semisupervised Learning

Citation Formats

Vatsavai, Raju, and Bhaduri, Budhendra L. A Hybrid Classification Scheme for Mining Multisource Geospatial Data. United States: N. p., 2007. Web.
Vatsavai, Raju, & Bhaduri, Budhendra L. A Hybrid Classification Scheme for Mining Multisource Geospatial Data. United States.
Vatsavai, Raju, and Bhaduri, Budhendra L. Mon . "A Hybrid Classification Scheme for Mining Multisource Geospatial Data". United States. doi:.
@article{osti_978780,
title = {A Hybrid Classification Scheme for Mining Multisource Geospatial Data},
author = {Vatsavai, Raju and Bhaduri, Budhendra L},
abstractNote = {Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and atmospheric conditions present at the time of data acquisition. A second problem with statistical classifiers is the requirement of large number of accurate training samples, which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, it is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately there is no convenient multivariate statistical model that can be employed for mulitsource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on real datasets, and our new hybrid approach shows over 15% improvement in classification accuracy over conventional classification schemes.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Jan 01 00:00:00 EST 2007},
month = {Mon Jan 01 00:00:00 EST 2007}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions (class conditional probability densities) are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and ecological zones. A second problem with statistical classifiers is the requirement of large number of accurate training samples (10 to 30 |dimensions|), which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, itmore » is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately there is no convenient multivariate statistical model that can be employed for mulitsource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on real datasets, and our new hybrid approach shows over 25 to 35% improvement in overall classification accuracy over conventional classification schemes.« less
  • In many practical situations thematic classes can not be discriminated by spectral measurements alone. Often one needs additional features such as population density, road density, wetlands, elevation, soil types, etc. which are discrete attributes. On the other hand remote sensing image features are continuous attributes. Finding a suitable statistical model and estimation of parameters is a challenging task in multisource (e.g., discrete and continuous attributes) data classification. In this paper we present a semi-supervised learning method by assuming that the samples were generated by a mixture model, where each component could be either a continuous or discrete distribution. Overall classificationmore » accuracy of the proposed method is improved by 12% in our initial experiments.« less
  • A systematic nomenclature and classification scheme is proposed for passive space heating and cooling systems. It is based upon the mode of energy transport to and from the space and the environmental resource from which the energy is received or to which it is discharged. A number of passive and hybrid space heating and cooling systems are characterized.
  • This paper introduces PADMA (PArallel Data Mining Agents), a parallel agent based system for scalable text classification. PADMA contains modules for (1) parallel data accessing operations, (2) parallel hierarchical clustering, and (3) web-based data visualization. This paper introduces the general architecture of PADMA and presents a detailed description of its different modules.
  • Demand side management studies include such goals as the reduction of the maximum electrical energy demand the demand strategic growing that may lead to specific actions as the rates adequation or the energy resources conservation or saving. To support these strategic decisions it is necessary to have typical models of customer consumption behavior. A sample of 112 monthly consumption files of selected energy users was used to classify customers using a data mining technique based on an ART (Adaptive Resonance Theory) algorithm modified with euclidean distance measure. Each file contains one load curve per day and each curve consists ofmore » 96 consumption values (Kwh), one for each 15 minute interval. The algorithm gives results very similar to those obtained by iteratively using a combination of traditional approaches of statistics and visualization. However, with the mining algorithm, the time invested in getting the results is in the order of minutes, as opposed to hours with the traditional approaches. The paper will describe in detail the methodology, algorithm and software system used to classify energy customers.« less